[jira] [Updated] (PIG-2586) A better plan/data flow visualizer

2013-09-08 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/PIG-2586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Avendaño updated PIG-2586:


Attachment: patch03

Patch to add visualize command. 
visualize -out out_folder [-script folder_to_script] alias

 A better plan/data flow visualizer
 --

 Key: PIG-2586
 URL: https://issues.apache.org/jira/browse/PIG-2586
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Daniel Dai
Assignee: Allan Avendaño
  Labels: gsoc2013
 Attachments: patch03


 Pig supports a dot graph style plan to visualize the 
 logical/physical/mapreduce plan (explain with -dot option, see 
 http://ofps.oreilly.com/titles/9781449302641/developing_and_testing.html). 
 However, dot graph takes extra step to generate the plan graph and the 
 quality of the output is not good. It's better we can implement a better 
 visualizer for Pig. It should:
 1. show operator type and alias
 2. turn on/off output schema
 3. dive into foreach inner plan on demand
 4. provide a way to show operator source code, eg, tooltip of an operator 
 (plan don't currently have this information, but you can assume this is in 
 place)
 5. besides visualize logical/physical/mapreduce plan, visualize the script 
 itself is also useful
 6. may rely on some java graphic library such as Swing
 This is a candidate project for Google summer of code 2013. More information 
 about the program can be found at 
 https://cwiki.apache.org/confluence/display/PIG/GSoc2013
 Functionality implemented so far, is available at 
 https://reviews.apache.org/r/12077/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2586) A better plan/data flow visualizer

2013-09-08 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/PIG-2586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Avendaño updated PIG-2586:


Attachment: graph01.zip

Example output folder of an script.

 A better plan/data flow visualizer
 --

 Key: PIG-2586
 URL: https://issues.apache.org/jira/browse/PIG-2586
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Daniel Dai
Assignee: Allan Avendaño
  Labels: gsoc2013
 Attachments: graph01.zip, patch03, visualize.zip


 Pig supports a dot graph style plan to visualize the 
 logical/physical/mapreduce plan (explain with -dot option, see 
 http://ofps.oreilly.com/titles/9781449302641/developing_and_testing.html). 
 However, dot graph takes extra step to generate the plan graph and the 
 quality of the output is not good. It's better we can implement a better 
 visualizer for Pig. It should:
 1. show operator type and alias
 2. turn on/off output schema
 3. dive into foreach inner plan on demand
 4. provide a way to show operator source code, eg, tooltip of an operator 
 (plan don't currently have this information, but you can assume this is in 
 place)
 5. besides visualize logical/physical/mapreduce plan, visualize the script 
 itself is also useful
 6. may rely on some java graphic library such as Swing
 This is a candidate project for Google summer of code 2013. More information 
 about the program can be found at 
 https://cwiki.apache.org/confluence/display/PIG/GSoc2013
 Functionality implemented so far, is available at 
 https://reviews.apache.org/r/12077/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2586) A better plan/data flow visualizer

2013-09-08 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/PIG-2586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Avendaño updated PIG-2586:


Attachment: visualize.zip

This file must be unzipped and placed inside pig folder.
This works as templates folder for visualize command.

 A better plan/data flow visualizer
 --

 Key: PIG-2586
 URL: https://issues.apache.org/jira/browse/PIG-2586
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Daniel Dai
Assignee: Allan Avendaño
  Labels: gsoc2013
 Attachments: graph01.zip, patch03, visualize.zip


 Pig supports a dot graph style plan to visualize the 
 logical/physical/mapreduce plan (explain with -dot option, see 
 http://ofps.oreilly.com/titles/9781449302641/developing_and_testing.html). 
 However, dot graph takes extra step to generate the plan graph and the 
 quality of the output is not good. It's better we can implement a better 
 visualizer for Pig. It should:
 1. show operator type and alias
 2. turn on/off output schema
 3. dive into foreach inner plan on demand
 4. provide a way to show operator source code, eg, tooltip of an operator 
 (plan don't currently have this information, but you can assume this is in 
 place)
 5. besides visualize logical/physical/mapreduce plan, visualize the script 
 itself is also useful
 6. may rely on some java graphic library such as Swing
 This is a candidate project for Google summer of code 2013. More information 
 about the program can be found at 
 https://cwiki.apache.org/confluence/display/PIG/GSoc2013
 Functionality implemented so far, is available at 
 https://reviews.apache.org/r/12077/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3445) Make Parquet format available out of the box in Pig

2013-09-08 Thread Lorand Bendig (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lorand Bendig updated PIG-3445:
---

Attachment: PIG-3445.patch

This patch adds the parquet-pig related packages to the pig-withouthadoop and 
pig-withdependencies jars and parquet.pig is added to the import search path.

 Make Parquet format available out of the box in Pig
 ---

 Key: PIG-3445
 URL: https://issues.apache.org/jira/browse/PIG-3445
 Project: Pig
  Issue Type: Improvement
Reporter: Julien Le Dem
 Attachments: PIG-3445.patch


 We would add the Parquet jar in the Pig packages to make it available out of 
 the box to pig users.
 On top of that we could add the parquet.pig package to the list of packages 
 to search for UDFs. (alternatively, the parquet jar could contain classes 
 name or.apache.pig.builtin.ParquetLoader and ParquetStorer)
 This way users can use Parquet simply by typing:
 A = LOAD 'foo' USING ParquetLoader();
 STORE A INTO 'bar' USING ParquetStorer();

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3454) Add support for Boolean fields when using JsonStorage.

2013-09-08 Thread Erik Selin (JIRA)
Erik Selin created PIG-3454:
---

 Summary: Add support for Boolean fields when using JsonStorage.
 Key: PIG-3454
 URL: https://issues.apache.org/jira/browse/PIG-3454
 Project: Pig
  Issue Type: Improvement
Reporter: Erik Selin
Priority: Minor


Add support for Boolean fields when using JsonStorage.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3454) Add support for Boolean fields when using JsonStorage.

2013-09-08 Thread Erik Selin (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Selin updated PIG-3454:


Patch Info: Patch Available

 Add support for Boolean fields when using JsonStorage.
 --

 Key: PIG-3454
 URL: https://issues.apache.org/jira/browse/PIG-3454
 Project: Pig
  Issue Type: Improvement
Reporter: Erik Selin
Priority: Minor
 Attachments: bugPig-3454.patch


 Add support for Boolean fields when using JsonStorage.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3454) Add support for Boolean fields when using JsonStorage.

2013-09-08 Thread Erik Selin (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Selin updated PIG-3454:


Attachment: bugPig-3454.patch

 Add support for Boolean fields when using JsonStorage.
 --

 Key: PIG-3454
 URL: https://issues.apache.org/jira/browse/PIG-3454
 Project: Pig
  Issue Type: Improvement
Reporter: Erik Selin
Priority: Minor
 Attachments: bugPig-3454.patch


 Add support for Boolean fields when using JsonStorage.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Subscription: PIG patch available

2013-09-08 Thread jira
Issue Subscription
Filter: PIG patch available (21 issues)

Subscriber: pigdaily

Key Summary
PIG-3451EvalFuncT ctor reflection to determine value of type param T is 
brittle
https://issues.apache.org/jira/browse/PIG-3451
PIG-3449Move JobCreationException to 
org.apache.pig.backend.hadoop.executionengine
https://issues.apache.org/jira/browse/PIG-3449
PIG-3448Tez backend layout
https://issues.apache.org/jira/browse/PIG-3448
PIG-3441Allow Pig to use default resources from Configuration objects
https://issues.apache.org/jira/browse/PIG-3441
PIG-3434Null subexpression in bincond nullifies outer tuple (or bag)
https://issues.apache.org/jira/browse/PIG-3434
PIG-3431Return more information for parsing related exceptions.
https://issues.apache.org/jira/browse/PIG-3431
PIG-3430Add xml format for explaining MapReduce Plan.
https://issues.apache.org/jira/browse/PIG-3430
PIG-3426Add support for removing s3 files
https://issues.apache.org/jira/browse/PIG-3426
PIG-3390Make pig working with HBase 0.95
https://issues.apache.org/jira/browse/PIG-3390
PIG-3388No support for Regex for row filter in 
org.apache.pig.backend.hadoop.hbase.HBaseStorage
https://issues.apache.org/jira/browse/PIG-3388
PIG-3367Add assert keyword (operator) in pig
https://issues.apache.org/jira/browse/PIG-3367
PIG-Fix remaining Windows core unit test failures
https://issues.apache.org/jira/browse/PIG-
PIG-3325Adding a tuple to a bag is slow
https://issues.apache.org/jira/browse/PIG-3325
PIG-3295Casting from bytearray failing after Union (even when each field is 
from a single Loader)
https://issues.apache.org/jira/browse/PIG-3295
PIG-3292Logical plan invalid state: duplicate uid in schema during 
self-join to get cross product
https://issues.apache.org/jira/browse/PIG-3292
PIG-3257Add unique identifier UDF
https://issues.apache.org/jira/browse/PIG-3257
PIG-3255Avoid extra byte array copy in streaming deserialize
https://issues.apache.org/jira/browse/PIG-3255
PIG-3199Expose LogicalPlan via PigServer API
https://issues.apache.org/jira/browse/PIG-3199
PIG-3088Add a builtin udf which removes prefixes
https://issues.apache.org/jira/browse/PIG-3088
PIG-3021Split results missing records when there is null values in the 
column comparison
https://issues.apache.org/jira/browse/PIG-3021
PIG-2417Streaming UDFs -  allow users to easily write UDFs in scripting 
languages with no JVM implementation.
https://issues.apache.org/jira/browse/PIG-2417

You may edit this subscription at:
https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225filterId=12322384


[jira] [Commented] (PIG-3255) Avoid extra byte array copy in streaming deserialize

2013-09-08 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13761566#comment-13761566
 ] 

Rohini Palaniswamy commented on PIG-3255:
-

If the interface change is ok, then thinking of changing even the 
PigToStream.java interface 

public byte[] serialize(Tuple t) throws IOException;

to 

public DataBuffer serialize(Tuple t) throws IOException;
 
 where DataBuffer will be same as 
http://svn.apache.org/viewvc/hadoop/common/branches/branch-2/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/DataOutputBuffer.java?revision=1306187view=markup

Don't want to use DataOutputBuffer itself as it is marked 
@InterfaceAudience.LimitedPrivate({HDFS, MapReduce})
@InterfaceStability.Unstable

This will get rid of one more byte array copy. Thoughts ? 


 Avoid extra byte array copy in streaming deserialize
 

 Key: PIG-3255
 URL: https://issues.apache.org/jira/browse/PIG-3255
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11
Reporter: Rohini Palaniswamy
Assignee: Rohini Palaniswamy
 Fix For: 0.12

 Attachments: PIG-3255-1.patch, PIG-3255-2.patch


 PigStreaming.java:
  public Tuple deserialize(byte[] bytes) throws IOException {
 Text val = new Text(bytes);  
 return StorageUtil.textToTuple(val, fieldDel);
 }
 Should remove new Text(bytes) copy and construct the tuple directly from the 
 bytes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Review Request 14030: [PIG-3255] Avoid extra byte array copy in streaming deserialize

2013-09-08 Thread Rohini Palaniswamy

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14030/
---

Review request for pig.


Bugs: PIG-3255
https://issues.apache.org/jira/browse/PIG-3255


Repository: pig


Description
---

Involves backward incompatible interface change


Diffs
-

  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/PigToStream.java 
1518333 
  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/StreamToPig.java 
1518333 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/builtin/PigStreaming.java
 1518333 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/data/DataBuffer.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/impl/streaming/InputHandler.java
 1518333 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/impl/streaming/OutputHandler.java
 1518333 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/impl/util/StorageUtil.java
 1518333 
  
http://svn.apache.org/repos/asf/pig/trunk/test/e2e/pig/udfs/java/org/apache/pig/test/udf/streaming/DumpStreamer.java
 1518333 
  
http://svn.apache.org/repos/asf/pig/trunk/test/e2e/pig/udfs/java/org/apache/pig/test/udf/streaming/DumpStreamerBad.java
 1518333 
  
http://svn.apache.org/repos/asf/pig/trunk/test/e2e/pig/udfs/java/org/apache/pig/test/udf/streaming/StreamingDump.java
 1518333 
  
http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestStreaming.java
 1518333 

Diff: https://reviews.apache.org/r/14030/diff/


Testing
---

No new unit tests. Only perf changes. TestStreaming tests passes


Thanks,

Rohini Palaniswamy



[jira] [Updated] (PIG-3255) Avoid extra byte array copy in streaming deserialize

2013-09-08 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-3255:


Attachment: PIG-3255-3.patch

https://reviews.apache.org/r/14030/

 Avoid extra byte array copy in streaming deserialize
 

 Key: PIG-3255
 URL: https://issues.apache.org/jira/browse/PIG-3255
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11
Reporter: Rohini Palaniswamy
Assignee: Rohini Palaniswamy
 Fix For: 0.12

 Attachments: PIG-3255-1.patch, PIG-3255-2.patch, PIG-3255-3.patch


 PigStreaming.java:
  public Tuple deserialize(byte[] bytes) throws IOException {
 Text val = new Text(bytes);  
 return StorageUtil.textToTuple(val, fieldDel);
 }
 Should remove new Text(bytes) copy and construct the tuple directly from the 
 bytes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira