[jira] [Commented] (PIG-3095) which is called many, many times for each Pig STREAM statement

2012-12-16 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13533301#comment-13533301
 ] 

Cheolsoo Park commented on PIG-3095:


+1.

I will run e2e test to be safe and commit your patch unless I see any failure. 
Thank you!

 which is called many, many times for each Pig STREAM statement
 

 Key: PIG-3095
 URL: https://issues.apache.org/jira/browse/PIG-3095
 Project: Pig
  Issue Type: Bug
  Components: grunt, impl
Affects Versions: 0.12
Reporter: Nick White
Assignee: Nick White
  Labels: patch, performance
 Fix For: 0.12

 Attachments: PIG-3095.1.patch, PIG-3095.patch


 STREAM statements are checked by the LogicalPlanBuilder as it comes across 
 them - and these checks include running the system utility which. However, 
 due to the backtracking parsing mechanism which is called repeatedly with 
 the same arguments (I noticed this while profiling a script with 4 STREAM 
 statements - which was run over 230 times!). The attached patch just caches 
 the return value of which, reducing the overhead of running a system 
 process to a Map lookup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2764) Add a biginteger and bigdecimal type to pig

2012-12-16 Thread Jonathan Coveney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13533484#comment-13533484
 ] 

Jonathan Coveney commented on PIG-2764:
---

I would have no problem tidying this up for inclusion, but all of the comments 
are pretty vague. Would love some concrete feedback on whether or not this 
deserves to be in Pig, and how it should be or could be different.

I think that more broadly speaking it would be huge to make type support a lot 
easier to add, but that's a separate JIRA.

 Add a biginteger and bigdecimal type to pig
 ---

 Key: PIG-2764
 URL: https://issues.apache.org/jira/browse/PIG-2764
 Project: Pig
  Issue Type: Improvement
Reporter: Jonathan Coveney
Assignee: Jonathan Coveney
 Attachments: fixedpoint.patch, PIG-2764-0.patch, PIG-2764-1.patch


 I think it would be useful for applications where precision is more important 
 than speed to have the option of using java's bigdecimal and biginteger types 
 natively.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3095) which is called many, many times for each Pig STREAM statement

2012-12-16 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3095:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks Nick!

 which is called many, many times for each Pig STREAM statement
 

 Key: PIG-3095
 URL: https://issues.apache.org/jira/browse/PIG-3095
 Project: Pig
  Issue Type: Bug
  Components: grunt, impl
Affects Versions: 0.12
Reporter: Nick White
Assignee: Nick White
  Labels: patch, performance
 Fix For: 0.12

 Attachments: PIG-3095.1.patch, PIG-3095.patch


 STREAM statements are checked by the LogicalPlanBuilder as it comes across 
 them - and these checks include running the system utility which. However, 
 due to the backtracking parsing mechanism which is called repeatedly with 
 the same arguments (I noticed this while profiling a script with 4 STREAM 
 statements - which was run over 230 times!). The attached patch just caches 
 the return value of which, reducing the overhead of running a system 
 process to a Map lookup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Build failed in Jenkins: Pig-trunk #1379

2012-12-16 Thread Apache Jenkins Server
See https://builds.apache.org/job/Pig-trunk/1379/changes

Changes:

[cheolsoo] PIG-3095: which is called many, many times for each Pig STREAM 
statement (nwhite via cheolsoo)

--
[...truncated 6587 lines...]
 [findbugs]   jline.History
 [findbugs]   org.jruby.embed.internal.LocalContextProvider
 [findbugs]   org.apache.hadoop.io.BooleanWritable
 [findbugs]   org.apache.log4j.Logger
 [findbugs]   org.apache.hadoop.hbase.filter.FamilyFilter
 [findbugs]   org.codehaus.jackson.annotate.JsonPropertyOrder
 [findbugs]   groovy.lang.Tuple
 [findbugs]   org.antlr.runtime.IntStream
 [findbugs]   org.apache.hadoop.util.ReflectionUtils
 [findbugs]   org.apache.hadoop.fs.ContentSummary
 [findbugs]   org.jruby.runtime.builtin.IRubyObject
 [findbugs]   org.jruby.RubyInteger
 [findbugs]   org.python.core.PyTuple
 [findbugs]   org.mortbay.log.Log
 [findbugs]   org.apache.hadoop.conf.Configuration
 [findbugs]   com.google.common.base.Joiner
 [findbugs]   org.apache.hadoop.mapreduce.lib.input.FileSplit
 [findbugs]   org.apache.hadoop.mapred.Counters$Counter
 [findbugs]   com.jcraft.jsch.Channel
 [findbugs]   org.apache.hadoop.mapred.JobPriority
 [findbugs]   org.apache.commons.cli.Options
 [findbugs]   org.apache.hadoop.mapred.JobID
 [findbugs]   org.apache.hadoop.util.bloom.BloomFilter
 [findbugs]   org.python.core.PyFrame
 [findbugs]   org.apache.hadoop.hbase.filter.CompareFilter
 [findbugs]   org.apache.hadoop.util.VersionInfo
 [findbugs]   org.python.core.PyString
 [findbugs]   org.apache.hadoop.io.Text$Comparator
 [findbugs]   org.jruby.runtime.Block
 [findbugs]   org.antlr.runtime.MismatchedSetException
 [findbugs]   org.apache.hadoop.io.BytesWritable
 [findbugs]   org.apache.hadoop.fs.FsShell
 [findbugs]   org.joda.time.Months
 [findbugs]   org.mozilla.javascript.ImporterTopLevel
 [findbugs]   org.apache.hadoop.hbase.mapreduce.TableOutputFormat
 [findbugs]   org.apache.hadoop.mapred.TaskReport
 [findbugs]   org.apache.hadoop.security.UserGroupInformation
 [findbugs]   org.antlr.runtime.tree.RewriteRuleSubtreeStream
 [findbugs]   org.apache.commons.cli.HelpFormatter
 [findbugs]   com.google.common.collect.Maps
 [findbugs]   org.joda.time.ReadableInstant
 [findbugs]   org.mozilla.javascript.NativeObject
 [findbugs]   org.apache.hadoop.hbase.HConstants
 [findbugs]   org.apache.hadoop.io.serializer.Deserializer
 [findbugs]   org.antlr.runtime.FailedPredicateException
 [findbugs]   org.apache.hadoop.io.compress.CompressionCodec
 [findbugs]   org.jruby.RubyNil
 [findbugs]   org.apache.hadoop.fs.FileStatus
 [findbugs]   org.apache.hadoop.hbase.client.Result
 [findbugs]   org.apache.hadoop.mapreduce.JobContext
 [findbugs]   org.codehaus.jackson.JsonGenerator
 [findbugs]   org.apache.hadoop.mapreduce.TaskAttemptContext
 [findbugs]   org.apache.hadoop.io.LongWritable$Comparator
 [findbugs]   org.codehaus.jackson.map.util.LRUMap
 [findbugs]   org.apache.hadoop.hbase.util.Bytes
 [findbugs]   org.antlr.runtime.MismatchedTokenException
 [findbugs]   org.codehaus.jackson.JsonParser
 [findbugs]   com.jcraft.jsch.UserInfo
 [findbugs]   org.apache.hadoop.hbase.filter.WhileMatchFilter
 [findbugs]   org.python.core.PyException
 [findbugs]   org.apache.commons.cli.ParseException
 [findbugs]   org.apache.hadoop.io.compress.CompressionOutputStream
 [findbugs]   org.apache.hadoop.hbase.filter.WritableByteArrayComparable
 [findbugs]   org.antlr.runtime.tree.CommonTreeNodeStream
 [findbugs]   org.apache.log4j.Level
 [findbugs]   org.apache.hadoop.hbase.client.Scan
 [findbugs]   org.jruby.anno.JRubyMethod
 [findbugs]   org.apache.hadoop.mapreduce.Job
 [findbugs]   com.google.common.util.concurrent.Futures
 [findbugs]   org.apache.commons.logging.LogFactory
 [findbugs]   org.apache.commons.collections.IteratorUtils
 [findbugs]   org.apache.commons.codec.binary.Base64
 [findbugs]   org.codehaus.jackson.map.ObjectMapper
 [findbugs]   org.apache.hadoop.fs.FileSystem
 [findbugs]   org.jruby.embed.LocalContextScope
 [findbugs]   org.apache.hadoop.hbase.filter.FilterList$Operator
 [findbugs]   org.jruby.RubySymbol
 [findbugs]   org.codehaus.jackson.map.annotate.JacksonStdImpl
 [findbugs]   org.apache.hadoop.hbase.io.ImmutableBytesWritable
 [findbugs]   org.apache.hadoop.io.serializer.SerializationFactory
 [findbugs]   org.antlr.runtime.tree.TreeAdaptor
 [findbugs]   org.apache.hadoop.mapred.RunningJob
 [findbugs]   org.antlr.runtime.CommonTokenStream
 [findbugs]   org.apache.hadoop.io.DataInputBuffer
 [findbugs]   org.apache.hadoop.io.file.tfile.TFile
 [findbugs]   org.apache.commons.cli.GnuParser
 [findbugs]   org.mozilla.javascript.Context
 [findbugs]   org.apache.hadoop.io.FloatWritable
 [findbugs]   org.antlr.runtime.tree.RewriteEarlyExitException
 [findbugs]   org.apache.hadoop.hbase.HBaseConfiguration
 [findbugs]   org.codehaus.jackson.JsonGenerationException
 [findbugs]   org.apache.hadoop.mapreduce.TaskInputOutputContext
 [findbugs]   org.apache.hadoop.io.compress.GzipCodec
 [findbugs]   

[jira] Subscription: PIG patch available

2012-12-16 Thread jira
Issue Subscription
Filter: PIG patch available (35 issues)

Subscriber: pigdaily

Key Summary
PIG-3088Add a builtin udf which removes prefixes
https://issues.apache.org/jira/browse/PIG-3088
PIG-3086Allow A Prefix To Be Added To URIs In PigUnit Tests 
https://issues.apache.org/jira/browse/PIG-3086
PIG-3078Make a UDF that, given a string, returns just the columns prefixed 
by that string
https://issues.apache.org/jira/browse/PIG-3078
PIG-3073POUserFunc creating log spam for large scripts
https://issues.apache.org/jira/browse/PIG-3073
PIG-3069Native Windows Compatibility for Pig E2E Tests and Harness
https://issues.apache.org/jira/browse/PIG-3069
PIG-3067HBaseStorage should be split up to become more managable
https://issues.apache.org/jira/browse/PIG-3067
PIG-3066Fix TestPigRunner in trunk
https://issues.apache.org/jira/browse/PIG-3066
PIG-3057make readField protected to be able to override it if we extend 
PigStorage
https://issues.apache.org/jira/browse/PIG-3057
PIG-3051java.lang.IndexOutOfBoundsException  failure with LimitOptimizer + 
ColumnPruning
https://issues.apache.org/jira/browse/PIG-3051
PIG-3029TestTypeCheckingValidatorNewLP has some path reference issues for 
cross-platform execution
https://issues.apache.org/jira/browse/PIG-3029
PIG-3028testGrunt dev test needs some command filters to run correctly 
without cygwin
https://issues.apache.org/jira/browse/PIG-3028
PIG-3027pigTest unit test needs a newline filter for comparisons of golden 
multi-line
https://issues.apache.org/jira/browse/PIG-3027
PIG-3026Pig checked-in baseline comparisons need a pre-filter to address 
OS-specific newline differences
https://issues.apache.org/jira/browse/PIG-3026
PIG-3025TestPruneColumn unit test - SimpleEchoStreamingCommand perl inline 
script needs simplification
https://issues.apache.org/jira/browse/PIG-3025
PIG-3024TestEmptyInputDir unit test - hadoop version detection logic is 
brittle
https://issues.apache.org/jira/browse/PIG-3024
PIG-3015Rewrite of AvroStorage
https://issues.apache.org/jira/browse/PIG-3015
PIG-3010Allow UDF's to flatten themselves
https://issues.apache.org/jira/browse/PIG-3010
PIG-2959Add a pig.cmd for Pig to run under Windows
https://issues.apache.org/jira/browse/PIG-2959
PIG-2957TetsScriptUDF fail due to volume prefix in jar
https://issues.apache.org/jira/browse/PIG-2957
PIG-2956Invalid cache specification for some streaming statement
https://issues.apache.org/jira/browse/PIG-2956
PIG-2955 Fix bunch of Pig e2e tests on Windows 
https://issues.apache.org/jira/browse/PIG-2955
PIG-2878Pig current releases lack a UDF equalIgnoreCase.This function 
returns a Boolean value indicating whether string left is equal to string 
right. This check is case insensitive.
https://issues.apache.org/jira/browse/PIG-2878
PIG-2873Converting bin/pig shell script to python
https://issues.apache.org/jira/browse/PIG-2873
PIG-2834MultiStorage requires unused constructor argument
https://issues.apache.org/jira/browse/PIG-2834
PIG-2824Pushing checking number of fields into LoadFunc
https://issues.apache.org/jira/browse/PIG-2824
PIG-2661Pig uses an extra job for loading data in Pigmix L9
https://issues.apache.org/jira/browse/PIG-2661
PIG-2645PigSplit does not handle the case where SerializationFactory 
returns null
https://issues.apache.org/jira/browse/PIG-2645
PIG-2614AvroStorage crashes on LOADING a single bad error
https://issues.apache.org/jira/browse/PIG-2614
PIG-2507Semicolon in paramenters for UDF results in parsing error
https://issues.apache.org/jira/browse/PIG-2507
PIG-2433Jython import module not working if module path is in classpath
https://issues.apache.org/jira/browse/PIG-2433
PIG-2417Streaming UDFs -  allow users to easily write UDFs in scripting 
languages with no JVM implementation.
https://issues.apache.org/jira/browse/PIG-2417
PIG-2362Rework Ant build.xml to use macrodef instead of antcall
https://issues.apache.org/jira/browse/PIG-2362
PIG-2312NPE when relation and column share the same name and used in Nested 
Foreach 
https://issues.apache.org/jira/browse/PIG-2312
PIG-1942script UDF (jython) should utilize the intended output schema to 
more directly convert Py objects to Pig objects
https://issues.apache.org/jira/browse/PIG-1942
PIG-1237Piggybank MutliStorage - specify field to write in output
https://issues.apache.org/jira/browse/PIG-1237

You may edit this subscription at:

[jira] [Created] (PIG-3096) Make PigUnit thread safe

2012-12-16 Thread Cheolsoo Park (JIRA)
Cheolsoo Park created PIG-3096:
--

 Summary: Make PigUnit thread safe
 Key: PIG-3096
 URL: https://issues.apache.org/jira/browse/PIG-3096
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11
Reporter: Cheolsoo Park
Assignee: Cheolsoo Park
 Fix For: 0.12


Currently, {{PigUnit}} is not thread-safe because {{Cluster}} and {{PigServer}} 
are declared as static. Converting them to ThreadLocal allows PigUnit to run in 
multi-threaded environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Review Request: PIG-3096 Make PigUnit thread safe

2012-12-16 Thread Cheolsoo Park

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8631/
---

Review request for pig and Santhosh Srinivasan.


Description
---

Currently, PigUnit is not thread-safe because Cluster and PigServer are 
declared as static. Converting them to ThreadLocal allows PigUnit to run in 
multi-threaded environment.


This addresses bug PIG-3096.
https://issues.apache.org/jira/browse/PIG-3096


Diffs
-

  test/org/apache/pig/pigunit/PigTest.java 50a5c79 

Diff: https://reviews.apache.org/r/8631/diff/


Testing
---

ant test -Dtestcase=TestPigTest

I also tested it by running multiple PigUnit cases in parallel with 
tempus-fugit (http://tempusfugitlibrary.org/documentation/junit/parallel/) on a 
real cluster.


Thanks,

Cheolsoo Park



[jira] [Created] (PIG-3097) HiveColumnarLoader doesn't correctly load partitioned Hive table

2012-12-16 Thread Richard Ding (JIRA)
Richard Ding created PIG-3097:
-

 Summary: HiveColumnarLoader doesn't correctly load partitioned 
Hive table 
 Key: PIG-3097
 URL: https://issues.apache.org/jira/browse/PIG-3097
 Project: Pig
  Issue Type: Bug
Reporter: Richard Ding
Assignee: Richard Ding



Given a partitioned Hive table:

{code}
hive describe mytable;
OK
f1string  
f2 string  
f3 string  
partition_dtstring
{code}

The following Pig script gives the correct schema:

{code}
grunt A = load '/hive/warehouse/mytable' using 
org.apache.pig.piggybank.storage.HiveColumnarLoader('f1 string,f2string,f3 
string');
grunt describe A
A: {f1: chararray,f2: chararray,f3: chararray,partition_dt: chararray}
{code}

But, the command

{code}
grunt dump A
{code}

only produces the first column of all records in the table (all four columns 
are expected).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira