[jira] [Updated] (PIG-2586) A better plan/data flow visualizer
[ https://issues.apache.org/jira/browse/PIG-2586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allan Avendaño updated PIG-2586: Attachment: patch03 Patch to add visualize command. visualize -out out_folder [-script folder_to_script] alias A better plan/data flow visualizer -- Key: PIG-2586 URL: https://issues.apache.org/jira/browse/PIG-2586 Project: Pig Issue Type: Improvement Components: impl Reporter: Daniel Dai Assignee: Allan Avendaño Labels: gsoc2013 Attachments: patch03 Pig supports a dot graph style plan to visualize the logical/physical/mapreduce plan (explain with -dot option, see http://ofps.oreilly.com/titles/9781449302641/developing_and_testing.html). However, dot graph takes extra step to generate the plan graph and the quality of the output is not good. It's better we can implement a better visualizer for Pig. It should: 1. show operator type and alias 2. turn on/off output schema 3. dive into foreach inner plan on demand 4. provide a way to show operator source code, eg, tooltip of an operator (plan don't currently have this information, but you can assume this is in place) 5. besides visualize logical/physical/mapreduce plan, visualize the script itself is also useful 6. may rely on some java graphic library such as Swing This is a candidate project for Google summer of code 2013. More information about the program can be found at https://cwiki.apache.org/confluence/display/PIG/GSoc2013 Functionality implemented so far, is available at https://reviews.apache.org/r/12077/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2586) A better plan/data flow visualizer
[ https://issues.apache.org/jira/browse/PIG-2586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allan Avendaño updated PIG-2586: Attachment: graph01.zip Example output folder of an script. A better plan/data flow visualizer -- Key: PIG-2586 URL: https://issues.apache.org/jira/browse/PIG-2586 Project: Pig Issue Type: Improvement Components: impl Reporter: Daniel Dai Assignee: Allan Avendaño Labels: gsoc2013 Attachments: graph01.zip, patch03, visualize.zip Pig supports a dot graph style plan to visualize the logical/physical/mapreduce plan (explain with -dot option, see http://ofps.oreilly.com/titles/9781449302641/developing_and_testing.html). However, dot graph takes extra step to generate the plan graph and the quality of the output is not good. It's better we can implement a better visualizer for Pig. It should: 1. show operator type and alias 2. turn on/off output schema 3. dive into foreach inner plan on demand 4. provide a way to show operator source code, eg, tooltip of an operator (plan don't currently have this information, but you can assume this is in place) 5. besides visualize logical/physical/mapreduce plan, visualize the script itself is also useful 6. may rely on some java graphic library such as Swing This is a candidate project for Google summer of code 2013. More information about the program can be found at https://cwiki.apache.org/confluence/display/PIG/GSoc2013 Functionality implemented so far, is available at https://reviews.apache.org/r/12077/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2586) A better plan/data flow visualizer
[ https://issues.apache.org/jira/browse/PIG-2586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allan Avendaño updated PIG-2586: Attachment: visualize.zip This file must be unzipped and placed inside pig folder. This works as templates folder for visualize command. A better plan/data flow visualizer -- Key: PIG-2586 URL: https://issues.apache.org/jira/browse/PIG-2586 Project: Pig Issue Type: Improvement Components: impl Reporter: Daniel Dai Assignee: Allan Avendaño Labels: gsoc2013 Attachments: graph01.zip, patch03, visualize.zip Pig supports a dot graph style plan to visualize the logical/physical/mapreduce plan (explain with -dot option, see http://ofps.oreilly.com/titles/9781449302641/developing_and_testing.html). However, dot graph takes extra step to generate the plan graph and the quality of the output is not good. It's better we can implement a better visualizer for Pig. It should: 1. show operator type and alias 2. turn on/off output schema 3. dive into foreach inner plan on demand 4. provide a way to show operator source code, eg, tooltip of an operator (plan don't currently have this information, but you can assume this is in place) 5. besides visualize logical/physical/mapreduce plan, visualize the script itself is also useful 6. may rely on some java graphic library such as Swing This is a candidate project for Google summer of code 2013. More information about the program can be found at https://cwiki.apache.org/confluence/display/PIG/GSoc2013 Functionality implemented so far, is available at https://reviews.apache.org/r/12077/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3445) Make Parquet format available out of the box in Pig
[ https://issues.apache.org/jira/browse/PIG-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lorand Bendig updated PIG-3445: --- Attachment: PIG-3445.patch This patch adds the parquet-pig related packages to the pig-withouthadoop and pig-withdependencies jars and parquet.pig is added to the import search path. Make Parquet format available out of the box in Pig --- Key: PIG-3445 URL: https://issues.apache.org/jira/browse/PIG-3445 Project: Pig Issue Type: Improvement Reporter: Julien Le Dem Attachments: PIG-3445.patch We would add the Parquet jar in the Pig packages to make it available out of the box to pig users. On top of that we could add the parquet.pig package to the list of packages to search for UDFs. (alternatively, the parquet jar could contain classes name or.apache.pig.builtin.ParquetLoader and ParquetStorer) This way users can use Parquet simply by typing: A = LOAD 'foo' USING ParquetLoader(); STORE A INTO 'bar' USING ParquetStorer(); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-3454) Add support for Boolean fields when using JsonStorage.
Erik Selin created PIG-3454: --- Summary: Add support for Boolean fields when using JsonStorage. Key: PIG-3454 URL: https://issues.apache.org/jira/browse/PIG-3454 Project: Pig Issue Type: Improvement Reporter: Erik Selin Priority: Minor Add support for Boolean fields when using JsonStorage. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3454) Add support for Boolean fields when using JsonStorage.
[ https://issues.apache.org/jira/browse/PIG-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Selin updated PIG-3454: Patch Info: Patch Available Add support for Boolean fields when using JsonStorage. -- Key: PIG-3454 URL: https://issues.apache.org/jira/browse/PIG-3454 Project: Pig Issue Type: Improvement Reporter: Erik Selin Priority: Minor Attachments: bugPig-3454.patch Add support for Boolean fields when using JsonStorage. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3454) Add support for Boolean fields when using JsonStorage.
[ https://issues.apache.org/jira/browse/PIG-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Selin updated PIG-3454: Attachment: bugPig-3454.patch Add support for Boolean fields when using JsonStorage. -- Key: PIG-3454 URL: https://issues.apache.org/jira/browse/PIG-3454 Project: Pig Issue Type: Improvement Reporter: Erik Selin Priority: Minor Attachments: bugPig-3454.patch Add support for Boolean fields when using JsonStorage. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Subscription: PIG patch available
Issue Subscription Filter: PIG patch available (21 issues) Subscriber: pigdaily Key Summary PIG-3451EvalFuncT ctor reflection to determine value of type param T is brittle https://issues.apache.org/jira/browse/PIG-3451 PIG-3449Move JobCreationException to org.apache.pig.backend.hadoop.executionengine https://issues.apache.org/jira/browse/PIG-3449 PIG-3448Tez backend layout https://issues.apache.org/jira/browse/PIG-3448 PIG-3441Allow Pig to use default resources from Configuration objects https://issues.apache.org/jira/browse/PIG-3441 PIG-3434Null subexpression in bincond nullifies outer tuple (or bag) https://issues.apache.org/jira/browse/PIG-3434 PIG-3431Return more information for parsing related exceptions. https://issues.apache.org/jira/browse/PIG-3431 PIG-3430Add xml format for explaining MapReduce Plan. https://issues.apache.org/jira/browse/PIG-3430 PIG-3426Add support for removing s3 files https://issues.apache.org/jira/browse/PIG-3426 PIG-3390Make pig working with HBase 0.95 https://issues.apache.org/jira/browse/PIG-3390 PIG-3388No support for Regex for row filter in org.apache.pig.backend.hadoop.hbase.HBaseStorage https://issues.apache.org/jira/browse/PIG-3388 PIG-3367Add assert keyword (operator) in pig https://issues.apache.org/jira/browse/PIG-3367 PIG-Fix remaining Windows core unit test failures https://issues.apache.org/jira/browse/PIG- PIG-3325Adding a tuple to a bag is slow https://issues.apache.org/jira/browse/PIG-3325 PIG-3295Casting from bytearray failing after Union (even when each field is from a single Loader) https://issues.apache.org/jira/browse/PIG-3295 PIG-3292Logical plan invalid state: duplicate uid in schema during self-join to get cross product https://issues.apache.org/jira/browse/PIG-3292 PIG-3257Add unique identifier UDF https://issues.apache.org/jira/browse/PIG-3257 PIG-3255Avoid extra byte array copy in streaming deserialize https://issues.apache.org/jira/browse/PIG-3255 PIG-3199Expose LogicalPlan via PigServer API https://issues.apache.org/jira/browse/PIG-3199 PIG-3088Add a builtin udf which removes prefixes https://issues.apache.org/jira/browse/PIG-3088 PIG-3021Split results missing records when there is null values in the column comparison https://issues.apache.org/jira/browse/PIG-3021 PIG-2417Streaming UDFs - allow users to easily write UDFs in scripting languages with no JVM implementation. https://issues.apache.org/jira/browse/PIG-2417 You may edit this subscription at: https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225filterId=12322384
[jira] [Commented] (PIG-3255) Avoid extra byte array copy in streaming deserialize
[ https://issues.apache.org/jira/browse/PIG-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13761566#comment-13761566 ] Rohini Palaniswamy commented on PIG-3255: - If the interface change is ok, then thinking of changing even the PigToStream.java interface public byte[] serialize(Tuple t) throws IOException; to public DataBuffer serialize(Tuple t) throws IOException; where DataBuffer will be same as http://svn.apache.org/viewvc/hadoop/common/branches/branch-2/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/DataOutputBuffer.java?revision=1306187view=markup Don't want to use DataOutputBuffer itself as it is marked @InterfaceAudience.LimitedPrivate({HDFS, MapReduce}) @InterfaceStability.Unstable This will get rid of one more byte array copy. Thoughts ? Avoid extra byte array copy in streaming deserialize Key: PIG-3255 URL: https://issues.apache.org/jira/browse/PIG-3255 Project: Pig Issue Type: Bug Affects Versions: 0.11 Reporter: Rohini Palaniswamy Assignee: Rohini Palaniswamy Fix For: 0.12 Attachments: PIG-3255-1.patch, PIG-3255-2.patch PigStreaming.java: public Tuple deserialize(byte[] bytes) throws IOException { Text val = new Text(bytes); return StorageUtil.textToTuple(val, fieldDel); } Should remove new Text(bytes) copy and construct the tuple directly from the bytes -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Review Request 14030: [PIG-3255] Avoid extra byte array copy in streaming deserialize
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/14030/ --- Review request for pig. Bugs: PIG-3255 https://issues.apache.org/jira/browse/PIG-3255 Repository: pig Description --- Involves backward incompatible interface change Diffs - http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/PigToStream.java 1518333 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/StreamToPig.java 1518333 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/builtin/PigStreaming.java 1518333 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/data/DataBuffer.java PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/impl/streaming/InputHandler.java 1518333 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/impl/streaming/OutputHandler.java 1518333 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/impl/util/StorageUtil.java 1518333 http://svn.apache.org/repos/asf/pig/trunk/test/e2e/pig/udfs/java/org/apache/pig/test/udf/streaming/DumpStreamer.java 1518333 http://svn.apache.org/repos/asf/pig/trunk/test/e2e/pig/udfs/java/org/apache/pig/test/udf/streaming/DumpStreamerBad.java 1518333 http://svn.apache.org/repos/asf/pig/trunk/test/e2e/pig/udfs/java/org/apache/pig/test/udf/streaming/StreamingDump.java 1518333 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestStreaming.java 1518333 Diff: https://reviews.apache.org/r/14030/diff/ Testing --- No new unit tests. Only perf changes. TestStreaming tests passes Thanks, Rohini Palaniswamy
[jira] [Updated] (PIG-3255) Avoid extra byte array copy in streaming deserialize
[ https://issues.apache.org/jira/browse/PIG-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohini Palaniswamy updated PIG-3255: Attachment: PIG-3255-3.patch https://reviews.apache.org/r/14030/ Avoid extra byte array copy in streaming deserialize Key: PIG-3255 URL: https://issues.apache.org/jira/browse/PIG-3255 Project: Pig Issue Type: Bug Affects Versions: 0.11 Reporter: Rohini Palaniswamy Assignee: Rohini Palaniswamy Fix For: 0.12 Attachments: PIG-3255-1.patch, PIG-3255-2.patch, PIG-3255-3.patch PigStreaming.java: public Tuple deserialize(byte[] bytes) throws IOException { Text val = new Text(bytes); return StorageUtil.textToTuple(val, fieldDel); } Should remove new Text(bytes) copy and construct the tuple directly from the bytes -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira