[jira] [Commented] (PIG-3095) which is called many, many times for each Pig STREAM statement
[ https://issues.apache.org/jira/browse/PIG-3095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13533301#comment-13533301 ] Cheolsoo Park commented on PIG-3095: +1. I will run e2e test to be safe and commit your patch unless I see any failure. Thank you! which is called many, many times for each Pig STREAM statement Key: PIG-3095 URL: https://issues.apache.org/jira/browse/PIG-3095 Project: Pig Issue Type: Bug Components: grunt, impl Affects Versions: 0.12 Reporter: Nick White Assignee: Nick White Labels: patch, performance Fix For: 0.12 Attachments: PIG-3095.1.patch, PIG-3095.patch STREAM statements are checked by the LogicalPlanBuilder as it comes across them - and these checks include running the system utility which. However, due to the backtracking parsing mechanism which is called repeatedly with the same arguments (I noticed this while profiling a script with 4 STREAM statements - which was run over 230 times!). The attached patch just caches the return value of which, reducing the overhead of running a system process to a Map lookup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2764) Add a biginteger and bigdecimal type to pig
[ https://issues.apache.org/jira/browse/PIG-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13533484#comment-13533484 ] Jonathan Coveney commented on PIG-2764: --- I would have no problem tidying this up for inclusion, but all of the comments are pretty vague. Would love some concrete feedback on whether or not this deserves to be in Pig, and how it should be or could be different. I think that more broadly speaking it would be huge to make type support a lot easier to add, but that's a separate JIRA. Add a biginteger and bigdecimal type to pig --- Key: PIG-2764 URL: https://issues.apache.org/jira/browse/PIG-2764 Project: Pig Issue Type: Improvement Reporter: Jonathan Coveney Assignee: Jonathan Coveney Attachments: fixedpoint.patch, PIG-2764-0.patch, PIG-2764-1.patch I think it would be useful for applications where precision is more important than speed to have the option of using java's bigdecimal and biginteger types natively. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3095) which is called many, many times for each Pig STREAM statement
[ https://issues.apache.org/jira/browse/PIG-3095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-3095: --- Resolution: Fixed Status: Resolved (was: Patch Available) Committed to trunk. Thanks Nick! which is called many, many times for each Pig STREAM statement Key: PIG-3095 URL: https://issues.apache.org/jira/browse/PIG-3095 Project: Pig Issue Type: Bug Components: grunt, impl Affects Versions: 0.12 Reporter: Nick White Assignee: Nick White Labels: patch, performance Fix For: 0.12 Attachments: PIG-3095.1.patch, PIG-3095.patch STREAM statements are checked by the LogicalPlanBuilder as it comes across them - and these checks include running the system utility which. However, due to the backtracking parsing mechanism which is called repeatedly with the same arguments (I noticed this while profiling a script with 4 STREAM statements - which was run over 230 times!). The attached patch just caches the return value of which, reducing the overhead of running a system process to a Map lookup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Build failed in Jenkins: Pig-trunk #1379
See https://builds.apache.org/job/Pig-trunk/1379/changes Changes: [cheolsoo] PIG-3095: which is called many, many times for each Pig STREAM statement (nwhite via cheolsoo) -- [...truncated 6587 lines...] [findbugs] jline.History [findbugs] org.jruby.embed.internal.LocalContextProvider [findbugs] org.apache.hadoop.io.BooleanWritable [findbugs] org.apache.log4j.Logger [findbugs] org.apache.hadoop.hbase.filter.FamilyFilter [findbugs] org.codehaus.jackson.annotate.JsonPropertyOrder [findbugs] groovy.lang.Tuple [findbugs] org.antlr.runtime.IntStream [findbugs] org.apache.hadoop.util.ReflectionUtils [findbugs] org.apache.hadoop.fs.ContentSummary [findbugs] org.jruby.runtime.builtin.IRubyObject [findbugs] org.jruby.RubyInteger [findbugs] org.python.core.PyTuple [findbugs] org.mortbay.log.Log [findbugs] org.apache.hadoop.conf.Configuration [findbugs] com.google.common.base.Joiner [findbugs] org.apache.hadoop.mapreduce.lib.input.FileSplit [findbugs] org.apache.hadoop.mapred.Counters$Counter [findbugs] com.jcraft.jsch.Channel [findbugs] org.apache.hadoop.mapred.JobPriority [findbugs] org.apache.commons.cli.Options [findbugs] org.apache.hadoop.mapred.JobID [findbugs] org.apache.hadoop.util.bloom.BloomFilter [findbugs] org.python.core.PyFrame [findbugs] org.apache.hadoop.hbase.filter.CompareFilter [findbugs] org.apache.hadoop.util.VersionInfo [findbugs] org.python.core.PyString [findbugs] org.apache.hadoop.io.Text$Comparator [findbugs] org.jruby.runtime.Block [findbugs] org.antlr.runtime.MismatchedSetException [findbugs] org.apache.hadoop.io.BytesWritable [findbugs] org.apache.hadoop.fs.FsShell [findbugs] org.joda.time.Months [findbugs] org.mozilla.javascript.ImporterTopLevel [findbugs] org.apache.hadoop.hbase.mapreduce.TableOutputFormat [findbugs] org.apache.hadoop.mapred.TaskReport [findbugs] org.apache.hadoop.security.UserGroupInformation [findbugs] org.antlr.runtime.tree.RewriteRuleSubtreeStream [findbugs] org.apache.commons.cli.HelpFormatter [findbugs] com.google.common.collect.Maps [findbugs] org.joda.time.ReadableInstant [findbugs] org.mozilla.javascript.NativeObject [findbugs] org.apache.hadoop.hbase.HConstants [findbugs] org.apache.hadoop.io.serializer.Deserializer [findbugs] org.antlr.runtime.FailedPredicateException [findbugs] org.apache.hadoop.io.compress.CompressionCodec [findbugs] org.jruby.RubyNil [findbugs] org.apache.hadoop.fs.FileStatus [findbugs] org.apache.hadoop.hbase.client.Result [findbugs] org.apache.hadoop.mapreduce.JobContext [findbugs] org.codehaus.jackson.JsonGenerator [findbugs] org.apache.hadoop.mapreduce.TaskAttemptContext [findbugs] org.apache.hadoop.io.LongWritable$Comparator [findbugs] org.codehaus.jackson.map.util.LRUMap [findbugs] org.apache.hadoop.hbase.util.Bytes [findbugs] org.antlr.runtime.MismatchedTokenException [findbugs] org.codehaus.jackson.JsonParser [findbugs] com.jcraft.jsch.UserInfo [findbugs] org.apache.hadoop.hbase.filter.WhileMatchFilter [findbugs] org.python.core.PyException [findbugs] org.apache.commons.cli.ParseException [findbugs] org.apache.hadoop.io.compress.CompressionOutputStream [findbugs] org.apache.hadoop.hbase.filter.WritableByteArrayComparable [findbugs] org.antlr.runtime.tree.CommonTreeNodeStream [findbugs] org.apache.log4j.Level [findbugs] org.apache.hadoop.hbase.client.Scan [findbugs] org.jruby.anno.JRubyMethod [findbugs] org.apache.hadoop.mapreduce.Job [findbugs] com.google.common.util.concurrent.Futures [findbugs] org.apache.commons.logging.LogFactory [findbugs] org.apache.commons.collections.IteratorUtils [findbugs] org.apache.commons.codec.binary.Base64 [findbugs] org.codehaus.jackson.map.ObjectMapper [findbugs] org.apache.hadoop.fs.FileSystem [findbugs] org.jruby.embed.LocalContextScope [findbugs] org.apache.hadoop.hbase.filter.FilterList$Operator [findbugs] org.jruby.RubySymbol [findbugs] org.codehaus.jackson.map.annotate.JacksonStdImpl [findbugs] org.apache.hadoop.hbase.io.ImmutableBytesWritable [findbugs] org.apache.hadoop.io.serializer.SerializationFactory [findbugs] org.antlr.runtime.tree.TreeAdaptor [findbugs] org.apache.hadoop.mapred.RunningJob [findbugs] org.antlr.runtime.CommonTokenStream [findbugs] org.apache.hadoop.io.DataInputBuffer [findbugs] org.apache.hadoop.io.file.tfile.TFile [findbugs] org.apache.commons.cli.GnuParser [findbugs] org.mozilla.javascript.Context [findbugs] org.apache.hadoop.io.FloatWritable [findbugs] org.antlr.runtime.tree.RewriteEarlyExitException [findbugs] org.apache.hadoop.hbase.HBaseConfiguration [findbugs] org.codehaus.jackson.JsonGenerationException [findbugs] org.apache.hadoop.mapreduce.TaskInputOutputContext [findbugs] org.apache.hadoop.io.compress.GzipCodec [findbugs]
[jira] Subscription: PIG patch available
Issue Subscription Filter: PIG patch available (35 issues) Subscriber: pigdaily Key Summary PIG-3088Add a builtin udf which removes prefixes https://issues.apache.org/jira/browse/PIG-3088 PIG-3086Allow A Prefix To Be Added To URIs In PigUnit Tests https://issues.apache.org/jira/browse/PIG-3086 PIG-3078Make a UDF that, given a string, returns just the columns prefixed by that string https://issues.apache.org/jira/browse/PIG-3078 PIG-3073POUserFunc creating log spam for large scripts https://issues.apache.org/jira/browse/PIG-3073 PIG-3069Native Windows Compatibility for Pig E2E Tests and Harness https://issues.apache.org/jira/browse/PIG-3069 PIG-3067HBaseStorage should be split up to become more managable https://issues.apache.org/jira/browse/PIG-3067 PIG-3066Fix TestPigRunner in trunk https://issues.apache.org/jira/browse/PIG-3066 PIG-3057make readField protected to be able to override it if we extend PigStorage https://issues.apache.org/jira/browse/PIG-3057 PIG-3051java.lang.IndexOutOfBoundsException failure with LimitOptimizer + ColumnPruning https://issues.apache.org/jira/browse/PIG-3051 PIG-3029TestTypeCheckingValidatorNewLP has some path reference issues for cross-platform execution https://issues.apache.org/jira/browse/PIG-3029 PIG-3028testGrunt dev test needs some command filters to run correctly without cygwin https://issues.apache.org/jira/browse/PIG-3028 PIG-3027pigTest unit test needs a newline filter for comparisons of golden multi-line https://issues.apache.org/jira/browse/PIG-3027 PIG-3026Pig checked-in baseline comparisons need a pre-filter to address OS-specific newline differences https://issues.apache.org/jira/browse/PIG-3026 PIG-3025TestPruneColumn unit test - SimpleEchoStreamingCommand perl inline script needs simplification https://issues.apache.org/jira/browse/PIG-3025 PIG-3024TestEmptyInputDir unit test - hadoop version detection logic is brittle https://issues.apache.org/jira/browse/PIG-3024 PIG-3015Rewrite of AvroStorage https://issues.apache.org/jira/browse/PIG-3015 PIG-3010Allow UDF's to flatten themselves https://issues.apache.org/jira/browse/PIG-3010 PIG-2959Add a pig.cmd for Pig to run under Windows https://issues.apache.org/jira/browse/PIG-2959 PIG-2957TetsScriptUDF fail due to volume prefix in jar https://issues.apache.org/jira/browse/PIG-2957 PIG-2956Invalid cache specification for some streaming statement https://issues.apache.org/jira/browse/PIG-2956 PIG-2955 Fix bunch of Pig e2e tests on Windows https://issues.apache.org/jira/browse/PIG-2955 PIG-2878Pig current releases lack a UDF equalIgnoreCase.This function returns a Boolean value indicating whether string left is equal to string right. This check is case insensitive. https://issues.apache.org/jira/browse/PIG-2878 PIG-2873Converting bin/pig shell script to python https://issues.apache.org/jira/browse/PIG-2873 PIG-2834MultiStorage requires unused constructor argument https://issues.apache.org/jira/browse/PIG-2834 PIG-2824Pushing checking number of fields into LoadFunc https://issues.apache.org/jira/browse/PIG-2824 PIG-2661Pig uses an extra job for loading data in Pigmix L9 https://issues.apache.org/jira/browse/PIG-2661 PIG-2645PigSplit does not handle the case where SerializationFactory returns null https://issues.apache.org/jira/browse/PIG-2645 PIG-2614AvroStorage crashes on LOADING a single bad error https://issues.apache.org/jira/browse/PIG-2614 PIG-2507Semicolon in paramenters for UDF results in parsing error https://issues.apache.org/jira/browse/PIG-2507 PIG-2433Jython import module not working if module path is in classpath https://issues.apache.org/jira/browse/PIG-2433 PIG-2417Streaming UDFs - allow users to easily write UDFs in scripting languages with no JVM implementation. https://issues.apache.org/jira/browse/PIG-2417 PIG-2362Rework Ant build.xml to use macrodef instead of antcall https://issues.apache.org/jira/browse/PIG-2362 PIG-2312NPE when relation and column share the same name and used in Nested Foreach https://issues.apache.org/jira/browse/PIG-2312 PIG-1942script UDF (jython) should utilize the intended output schema to more directly convert Py objects to Pig objects https://issues.apache.org/jira/browse/PIG-1942 PIG-1237Piggybank MutliStorage - specify field to write in output https://issues.apache.org/jira/browse/PIG-1237 You may edit this subscription at:
[jira] [Created] (PIG-3096) Make PigUnit thread safe
Cheolsoo Park created PIG-3096: -- Summary: Make PigUnit thread safe Key: PIG-3096 URL: https://issues.apache.org/jira/browse/PIG-3096 Project: Pig Issue Type: Bug Affects Versions: 0.11 Reporter: Cheolsoo Park Assignee: Cheolsoo Park Fix For: 0.12 Currently, {{PigUnit}} is not thread-safe because {{Cluster}} and {{PigServer}} are declared as static. Converting them to ThreadLocal allows PigUnit to run in multi-threaded environment. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Review Request: PIG-3096 Make PigUnit thread safe
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/8631/ --- Review request for pig and Santhosh Srinivasan. Description --- Currently, PigUnit is not thread-safe because Cluster and PigServer are declared as static. Converting them to ThreadLocal allows PigUnit to run in multi-threaded environment. This addresses bug PIG-3096. https://issues.apache.org/jira/browse/PIG-3096 Diffs - test/org/apache/pig/pigunit/PigTest.java 50a5c79 Diff: https://reviews.apache.org/r/8631/diff/ Testing --- ant test -Dtestcase=TestPigTest I also tested it by running multiple PigUnit cases in parallel with tempus-fugit (http://tempusfugitlibrary.org/documentation/junit/parallel/) on a real cluster. Thanks, Cheolsoo Park
[jira] [Created] (PIG-3097) HiveColumnarLoader doesn't correctly load partitioned Hive table
Richard Ding created PIG-3097: - Summary: HiveColumnarLoader doesn't correctly load partitioned Hive table Key: PIG-3097 URL: https://issues.apache.org/jira/browse/PIG-3097 Project: Pig Issue Type: Bug Reporter: Richard Ding Assignee: Richard Ding Given a partitioned Hive table: {code} hive describe mytable; OK f1string f2 string f3 string partition_dtstring {code} The following Pig script gives the correct schema: {code} grunt A = load '/hive/warehouse/mytable' using org.apache.pig.piggybank.storage.HiveColumnarLoader('f1 string,f2string,f3 string'); grunt describe A A: {f1: chararray,f2: chararray,f3: chararray,partition_dt: chararray} {code} But, the command {code} grunt dump A {code} only produces the first column of all records in the table (all four columns are expected). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira