date:20120814

Javascript UDFs don't work in local mode? At all?

2012-08-14 Thread Russell Jurney

I am loading 'test.js' in my script exactly as in the documentation for Pig
0.10, and the file contains the pasted example Javascript UDFs and I get:


Pig Stack Trace
---
ERROR 2998: Unhandled internal error. org/mozilla/javascript/EcmaError

java.lang.NoClassDefFoundError: org/mozilla/javascript/EcmaError
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:169)
at org.apache.pig.scripting.ScriptEngine.getInstance(ScriptEngine.java:254)
at org.apache.pig.PigServer.registerCode(PigServer.java:523)
at
org.apache.pig.tools.grunt.GruntParser.processRegister(GruntParser.java:422)
at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:419)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
at org.apache.pig.Main.run(Main.java:490)
at org.apache.pig.Main.main(Main.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.lang.ClassNotFoundException:
org.mozilla.javascript.EcmaError
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
... 16 more



Help, I wanted to include a javascript UDF in a blog post :(
-- 
Russell Jurney twitter.com/rjurney russell.jur...@gmail.com datasyndrome.com

[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-08-14 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434787#comment-13434787
 ] 

Zhijie Shen commented on PIG-1314:
--

{quote}
I believe you should be able to set the default timezone property in PigContext 
constructor, and also let user override the default. In backend, you can access 
the value using something like - 
PigMapReduce.sJobConfInternal.get().get("pig.datetime.default.tz").
{quote}

Thank you, Thejas! Let me investigate this issue.

> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: joda_vs_builtin.zip, PIG-1314-1.patch, PIG-1314-2.patch, 
> PIG-1314-3.patch, PIG-1314-4.patch, PIG-1314-5.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-08-14 Thread Russell Jurney (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434760#comment-13434760
 ] 

Russell Jurney commented on PIG-1314:
-

I agree with Thejas. The user will want to control the timezone of NOW() 
without having to reconfigure the hadoop cluster/contact the hadoop 
administrator. Setting this on the client is consistent with Pig as a 
client-side technology.

> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: joda_vs_builtin.zip, PIG-1314-1.patch, PIG-1314-2.patch, 
> PIG-1314-3.patch, PIG-1314-4.patch, PIG-1314-5.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Review Request: Review for PIG-1314 - add datetime type in pig

2012-08-14 Thread Thejas Nair


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5414/
---

(Updated Aug. 15, 2012, 1:46 a.m.)


Review request for pig.


Changes
---

latest patch from Zhijie


Description
---

Review for PIG-1314


This addresses bug PIG-1314.
https://issues.apache.org/jira/browse/PIG-1314


Diffs (updated)
-

  http://svn.apache.org/repos/asf/pig/trunk/.eclipse.templates/.classpath 
1371785 
  http://svn.apache.org/repos/asf/pig/trunk/conf/pig.properties 1371785 
  
http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/DBStorage.java
 1371785 
  
http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/SequenceFileLoader.java
 1371785 
  
http://svn.apache.org/repos/asf/pig/trunk/contrib/zebra/src/java/org/apache/hadoop/zebra/pig/SchemaConverter.java
 1371785 
  
http://svn.apache.org/repos/asf/pig/trunk/contrib/zebra/src/java/org/apache/hadoop/zebra/pig/comparator/DateTimeExpr.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/pig/trunk/contrib/zebra/src/java/org/apache/hadoop/zebra/pig/comparator/ExprUtils.java
 1371785 
  
http://svn.apache.org/repos/asf/pig/trunk/contrib/zebra/src/java/org/apache/hadoop/zebra/schema/ColumnType.java
 1371785 
  
http://svn.apache.org/repos/asf/pig/trunk/contrib/zebra/src/java/org/apache/hadoop/zebra/schema/SchemaParser.jjt
 1371785 
  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/LoadCaster.java 
1371785 
  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/PigServer.java 
1371785 
  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/PigWarning.java 
1371785 
  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/StoreCaster.java 
1371785 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/DateTimeWritable.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/HDataType.java
 1371785 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java
 1371785 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigDateTimeRawComparator.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/partitioners/WeightedRangePartitioner.java
 1371785 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/PhysicalOperator.java
 1371785 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/ComparisonOperator.java
 1371785 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/ConstantExpression.java
 1371785 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/EqualToExpr.java
 1371785 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/ExpressionOperator.java
 1371785 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/GTOrEqualToExpr.java
 1371785 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/GreaterThanExpr.java
 1371785 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/LTOrEqualToExpr.java
 1371785 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/LessThanExpr.java
 1371785 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/NotEqualToExpr.java
 1371785 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POBinCond.java
 1371785 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POCast.java
 1371785 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POIsNull.java
 1371785 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POMapLookUp.java
 1371785 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POProject.java
 1371785 
  
http://svn.apache.org/repos/asf/pig/trun

[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-08-14 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434733#comment-13434733
 ] 

Thejas M Nair commented on PIG-1314:


bq. 2. According to your last response, I'm not clear how the default timezone 
of client can be sent to the server with the code. In my opinion, the default 
timezone should be specified on the server side by configuration, which should 
be taken care of by administrators. How do you think about this.

I believe you should be able to set the default timezone property in PigContext 
constructor, and also let user override the default. In backend, you can access 
the value using something like - 
PigMapReduce.sJobConfInternal.get().get("pig.datetime.default.tz").


> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: joda_vs_builtin.zip, PIG-1314-1.patch, PIG-1314-2.patch, 
> PIG-1314-3.patch, PIG-1314-4.patch, PIG-1314-5.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2877) Make SchemaTuple work in foreach (and thus, in loads)

2012-08-14 Thread Jonathan Coveney (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Coveney updated PIG-2877:
--

Attachment: PIG-2877-0.patch

> Make SchemaTuple work in foreach (and thus, in loads) 
> --
>
> Key: PIG-2877
> URL: https://issues.apache.org/jira/browse/PIG-2877
> Project: Pig
>  Issue Type: Improvement
>Reporter: Jonathan Coveney
>Assignee: Jonathan Coveney
> Fix For: 0.11
>
> Attachments: PIG-2877-0.patch
>
>
> This wires foreaches to use SchemaTuple. As an aside, while SchemaTuple is 
> not turned on by default, I made sure that it worked with ant test-commit if 
> it were (there were some edge cases and lifecycle stuff that needed to be 
> cleaned up).
> Further, I refactored some of the tests that I needed to work with for other 
> reasons to be much cleaner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2877) Make SchemaTuple work in foreach (and thus, in loads)

2012-08-14 Thread Jonathan Coveney (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Coveney updated PIG-2877:
--

Status: Patch Available  (was: Open)

> Make SchemaTuple work in foreach (and thus, in loads) 
> --
>
> Key: PIG-2877
> URL: https://issues.apache.org/jira/browse/PIG-2877
> Project: Pig
>  Issue Type: Improvement
>Reporter: Jonathan Coveney
>Assignee: Jonathan Coveney
> Fix For: 0.11
>
> Attachments: PIG-2877-0.patch
>
>
> This wires foreaches to use SchemaTuple. As an aside, while SchemaTuple is 
> not turned on by default, I made sure that it worked with ant test-commit if 
> it were (there were some edge cases and lifecycle stuff that needed to be 
> cleaned up).
> Further, I refactored some of the tests that I needed to work with for other 
> reasons to be much cleaner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (PIG-2877) Make SchemaTuple work in foreach (and thus, in loads)

2012-08-14 Thread Jonathan Coveney (JIRA)

Jonathan Coveney created PIG-2877:
-

 Summary: Make SchemaTuple work in foreach (and thus, in loads) 
 Key: PIG-2877
 URL: https://issues.apache.org/jira/browse/PIG-2877
 Project: Pig
  Issue Type: Improvement
Reporter: Jonathan Coveney
Assignee: Jonathan Coveney
 Fix For: 0.11


This wires foreaches to use SchemaTuple. As an aside, while SchemaTuple is not 
turned on by default, I made sure that it worked with ant test-commit if it 
were (there were some edge cases and lifecycle stuff that needed to be cleaned 
up).

Further, I refactored some of the tests that I needed to work with for other 
reasons to be much cleaner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-1891) Enable StoreFunc to make intelligent decision based on job success or failure

2012-08-14 Thread Eli Reisman (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Reisman updated PIG-1891:
-

Attachment: PIG-1891-2.patch

Hey Alan, what do you think of this?

It restores cleanupOnFailureImpl (why is this exposed in the interface at all, 
btw?) and does not attempt to implement cleanupOnSuccess, just adds it where 
relevant. This way users can implement it themselves if they need it in their 
StoreFunc.

Also: would you look at the way it is wired into PigServer#launchPlan() I'm 
giving it the same args that cleanupOnFailure() gets but I'm not certain this 
is the information a user would want it to receive. I expect if they do 
implement cleanupOnSuccess, these args will provide the data to delete? In the 
DB example here in this thread, will the data already have been successfully 
loaded to DB by the user code,and this merely has to erase unneeded files the 
data was stored in during processing steps after the fact? Would 
cleanupOnSuccess include the 'load to database' and 'erase leftover files' code 
together?

Anyway, thanks, let me know if this is what we need or I'm on the right track, 
thanks again.

> Enable StoreFunc to make intelligent decision based on job success or failure
> -
>
> Key: PIG-1891
> URL: https://issues.apache.org/jira/browse/PIG-1891
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.10.0
>Reporter: Alex Rovner
>Priority: Minor
>  Labels: patch
> Attachments: PIG-1891-1.patch, PIG-1891-2.patch
>
>
> We are in the process of using PIG for various data processing and component 
> integration. Here is where we feel pig storage funcs lack:
> They are not aware if the over all job has succeeded. This creates a problem 
> for storage funcs which needs to "upload" results into another system:
> DB, FTP, another file system etc.
> I looked at the DBStorage in the piggybank 
> (http://svn.apache.org/viewvc/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/DBStorage.java?view=markup)
>  and what I see is essentially a mechanism which for each task does the 
> following:
> 1. Creates a recordwriter (in this case open connection to db)
> 2. Open transaction.
> 3. Writes records into a batch
> 4. Executes commit or rollback depending if the task was successful.
> While this aproach works great on a task level, it does not work at all on a 
> job level. 
> If certain tasks will succeed but over job will fail, partial records are 
> going to get uploaded into the DB.
> Any ideas on the workaround? 
> Our current workaround is fairly ugly: We created a java wrapper that 
> launches pig jobs and then uploads to DB's once pig's job is successful. 
> While the approach works, it's not really integrated into pig.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

Build failed in Jenkins: Pig-trunk #1298

2012-08-14 Thread Apache Jenkins Server

See 

Changes:

[jcoveney] PIG-2876: Bump up Xerces version (jcoveney)

[billgraham] PIG-2871: Refactor signature for PigReducerEstimator (billgraham)

[billgraham] PIG-2866: PigServer fails with macros without a script file 
(billgraham)

--
[...truncated 6468 lines...]
 [findbugs]   org.apache.hadoop.util.RunJar
 [findbugs]   org.jruby.RubyBoolean
 [findbugs]   org.apache.hadoop.mapred.Counters$Group
 [findbugs]   com.jcraft.jsch.ChannelExec
 [findbugs]   org.apache.hadoop.hbase.util.Base64
 [findbugs]   org.antlr.runtime.TokenStream
 [findbugs]   org.apache.hadoop.io.IOUtils
 [findbugs]   org.jruby.RubyBignum
 [findbugs]   com.google.common.util.concurrent.CheckedFuture
 [findbugs]   org.apache.hadoop.io.file.tfile.TFile$Reader$Scanner$Entry
 [findbugs]   org.apache.hadoop.fs.FSDataInputStream
 [findbugs]   org.python.core.PyObject
 [findbugs]   jline.History
 [findbugs]   org.jruby.embed.internal.LocalContextProvider
 [findbugs]   org.apache.hadoop.io.BooleanWritable
 [findbugs]   org.apache.log4j.Logger
 [findbugs]   org.apache.hadoop.hbase.filter.FamilyFilter
 [findbugs]   groovy.lang.Tuple
 [findbugs]   org.antlr.runtime.IntStream
 [findbugs]   org.apache.hadoop.util.ReflectionUtils
 [findbugs]   org.apache.hadoop.fs.ContentSummary
 [findbugs]   org.jruby.runtime.builtin.IRubyObject
 [findbugs]   org.jruby.RubyInteger
 [findbugs]   org.python.core.PyTuple
 [findbugs]   org.mortbay.log.Log
 [findbugs]   org.apache.hadoop.conf.Configuration
 [findbugs]   com.google.common.base.Joiner
 [findbugs]   org.apache.hadoop.mapreduce.lib.input.FileSplit
 [findbugs]   org.apache.hadoop.mapred.Counters$Counter
 [findbugs]   com.jcraft.jsch.Channel
 [findbugs]   org.apache.hadoop.mapred.JobPriority
 [findbugs]   org.apache.commons.cli.Options
 [findbugs]   org.apache.hadoop.mapred.JobID
 [findbugs]   org.apache.hadoop.util.bloom.BloomFilter
 [findbugs]   org.python.core.PyFrame
 [findbugs]   org.apache.hadoop.hbase.filter.CompareFilter
 [findbugs]   org.apache.hadoop.util.VersionInfo
 [findbugs]   org.python.core.PyString
 [findbugs]   org.apache.hadoop.io.Text$Comparator
 [findbugs]   org.jruby.runtime.Block
 [findbugs]   org.antlr.runtime.MismatchedSetException
 [findbugs]   org.apache.hadoop.io.BytesWritable
 [findbugs]   org.apache.hadoop.fs.FsShell
 [findbugs]   org.mozilla.javascript.ImporterTopLevel
 [findbugs]   org.apache.hadoop.hbase.mapreduce.TableOutputFormat
 [findbugs]   org.apache.hadoop.mapred.TaskReport
 [findbugs]   org.antlr.runtime.tree.RewriteRuleSubtreeStream
 [findbugs]   org.apache.commons.cli.HelpFormatter
 [findbugs]   com.google.common.collect.Maps
 [findbugs]   org.mozilla.javascript.NativeObject
 [findbugs]   org.apache.hadoop.hbase.HConstants
 [findbugs]   org.apache.hadoop.io.serializer.Deserializer
 [findbugs]   org.antlr.runtime.FailedPredicateException
 [findbugs]   org.apache.hadoop.io.compress.CompressionCodec
 [findbugs]   org.jruby.RubyNil
 [findbugs]   org.apache.hadoop.fs.FileStatus
 [findbugs]   org.apache.hadoop.hbase.client.Result
 [findbugs]   org.apache.hadoop.mapreduce.JobContext
 [findbugs]   org.codehaus.jackson.JsonGenerator
 [findbugs]   org.apache.hadoop.mapreduce.TaskAttemptContext
 [findbugs]   org.apache.hadoop.io.BytesWritable$Comparator
 [findbugs]   org.apache.hadoop.io.LongWritable$Comparator
 [findbugs]   org.codehaus.jackson.map.util.LRUMap
 [findbugs]   org.apache.hadoop.hbase.util.Bytes
 [findbugs]   org.antlr.runtime.MismatchedTokenException
 [findbugs]   org.codehaus.jackson.JsonParser
 [findbugs]   com.jcraft.jsch.UserInfo
 [findbugs]   org.python.core.PyException
 [findbugs]   org.apache.commons.cli.ParseException
 [findbugs]   org.apache.hadoop.io.compress.CompressionOutputStream
 [findbugs]   org.apache.hadoop.hbase.filter.WritableByteArrayComparable
 [findbugs]   org.antlr.runtime.tree.CommonTreeNodeStream
 [findbugs]   org.apache.log4j.Level
 [findbugs]   org.apache.hadoop.hbase.client.Scan
 [findbugs]   org.jruby.anno.JRubyMethod
 [findbugs]   org.apache.hadoop.mapreduce.Job
 [findbugs]   com.google.common.util.concurrent.Futures
 [findbugs]   org.apache.commons.logging.LogFactory
 [findbugs]   org.apache.commons.codec.binary.Base64
 [findbugs]   org.codehaus.jackson.map.ObjectMapper
 [findbugs]   org.apache.hadoop.fs.FileSystem
 [findbugs]   org.jruby.embed.LocalContextScope
 [findbugs]   org.apache.hadoop.hbase.filter.FilterList$Operator
 [findbugs]   org.jruby.RubySymbol
 [findbugs]   org.apache.hadoop.hbase.io.ImmutableBytesWritable
 [findbugs]   org.apache.hadoop.io.serializer.SerializationFactory
 [findbugs]   org.antlr.runtime.tree.TreeAdaptor
 [findbugs]   org.apache.hadoop.mapred.RunningJob
 [findbugs]   org.antlr.runtime.CommonTokenStream
 [findbugs]   org.apache.hadoop.io.DataInputBuffer
 [findbugs]   org.apache.hadoop.io.file.tfile.TFile
 [findbugs]   org.apache.commons.cli.GnuParser
 [findbugs]   org.mozilla.javascript.Context
 [findbugs]

[jira] [Updated] (PIG-2876) Bump up Xerces version

2012-08-14 Thread Jonathan Coveney (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Coveney updated PIG-2876:
--

Resolution: Fixed
  Assignee: Jonathan Coveney
Status: Resolved  (was: Patch Available)

> Bump up Xerces version
> --
>
> Key: PIG-2876
> URL: https://issues.apache.org/jira/browse/PIG-2876
> Project: Pig
>  Issue Type: Bug
>Reporter: Jonathan Coveney
>Assignee: Jonathan Coveney
> Fix For: 0.11
>
> Attachments: PIG-2876-0.patch
>
>
> In some cases on some environments, our version of xerces has errors with 
> Hadoop. Bumping the version and adding Xalan fixes this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2876) Bump up Xerces version

2012-08-14 Thread Bill Graham (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434605#comment-13434605
 ] 

Bill Graham commented on PIG-2876:
--

+1!

Thanks for this! This fix is what I need to run tests that use mini cluster on 
my macbook pro.

> Bump up Xerces version
> --
>
> Key: PIG-2876
> URL: https://issues.apache.org/jira/browse/PIG-2876
> Project: Pig
>  Issue Type: Bug
>Reporter: Jonathan Coveney
> Fix For: 0.11
>
> Attachments: PIG-2876-0.patch
>
>
> In some cases on some environments, our version of xerces has errors with 
> Hadoop. Bumping the version and adding Xalan fixes this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2876) Bump up Xerces version

2012-08-14 Thread Jonathan Coveney (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Coveney updated PIG-2876:
--

Fix Version/s: 0.11
   Status: Patch Available  (was: Open)

> Bump up Xerces version
> --
>
> Key: PIG-2876
> URL: https://issues.apache.org/jira/browse/PIG-2876
> Project: Pig
>  Issue Type: Bug
>Reporter: Jonathan Coveney
> Fix For: 0.11
>
> Attachments: PIG-2876-0.patch
>
>
> In some cases on some environments, our version of xerces has errors with 
> Hadoop. Bumping the version and adding Xalan fixes this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-1891) Enable StoreFunc to make intelligent decision based on job success or failure

2012-08-14 Thread Eli Reisman (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434569#comment-13434569
 ] 

Eli Reisman commented on PIG-1891:
--

I'll take a look at where the framework should call it. Its been a while but as 
I recall the cleanupImpl is called from within the same old cleanupFailure that 
was already there, still in place. I moved the code to cleanupImpl so I could 
also call it from cleanupSuccess as the function was the same, only the context 
of the call differs. I suppose when people override these methods there might 
be more differences. I'll take a look at the code today, and try to have 
another patch up ASAP. Thanks again, if there's anything else I've overlooked 
please let me know.




> Enable StoreFunc to make intelligent decision based on job success or failure
> -
>
> Key: PIG-1891
> URL: https://issues.apache.org/jira/browse/PIG-1891
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.10.0
>Reporter: Alex Rovner
>Priority: Minor
>  Labels: patch
> Attachments: PIG-1891-1.patch
>
>
> We are in the process of using PIG for various data processing and component 
> integration. Here is where we feel pig storage funcs lack:
> They are not aware if the over all job has succeeded. This creates a problem 
> for storage funcs which needs to "upload" results into another system:
> DB, FTP, another file system etc.
> I looked at the DBStorage in the piggybank 
> (http://svn.apache.org/viewvc/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/DBStorage.java?view=markup)
>  and what I see is essentially a mechanism which for each task does the 
> following:
> 1. Creates a recordwriter (in this case open connection to db)
> 2. Open transaction.
> 3. Writes records into a batch
> 4. Executes commit or rollback depending if the task was successful.
> While this aproach works great on a task level, it does not work at all on a 
> job level. 
> If certain tasks will succeed but over job will fail, partial records are 
> going to get uploaded into the DB.
> Any ideas on the workaround? 
> Our current workaround is fairly ugly: We created a java wrapper that 
> launches pig jobs and then uploads to DB's once pig's job is successful. 
> While the approach works, it's not really integrated into pig.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2876) Bump up Xerces version

2012-08-14 Thread Jonathan Coveney (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Coveney updated PIG-2876:
--

Attachment: PIG-2876-0.patch

> Bump up Xerces version
> --
>
> Key: PIG-2876
> URL: https://issues.apache.org/jira/browse/PIG-2876
> Project: Pig
>  Issue Type: Bug
>Reporter: Jonathan Coveney
> Fix For: 0.11
>
> Attachments: PIG-2876-0.patch
>
>
> In some cases on some environments, our version of xerces has errors with 
> Hadoop. Bumping the version and adding Xalan fixes this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (PIG-2876) Bump up Xerces version

2012-08-14 Thread Jonathan Coveney (JIRA)

Jonathan Coveney created PIG-2876:
-

 Summary: Bump up Xerces version
 Key: PIG-2876
 URL: https://issues.apache.org/jira/browse/PIG-2876
 Project: Pig
  Issue Type: Bug
Reporter: Jonathan Coveney


In some cases on some environments, our version of xerces has errors with 
Hadoop. Bumping the version and adding Xalan fixes this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2578) Multiple Store-commands mess up mapred.output.dir.

2012-08-14 Thread Rohini Palaniswamy (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-2578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434548#comment-13434548
 ] 

Rohini Palaniswamy commented on PIG-2578:
-

Did some debugging with and without PIG-2578. Multiple storage using PigStorage 
worked fine in both cases. This is because before every getOutputFormat call, 
there is a setLocation with a copy of JobContext or TaskAttemptContext and that 
copy was passed to getOutputCommitter(), getRecordWriter() or 
checkOutputSpecs() calls. So the output format actually runs with the correct 
configuration. So multiple store commands don't always get messed up. The 
corner case problem I see is that, if one instance of the store set a 
configuration to a specific value and another instance of the store does not 
set any value at all for that config it will still get the config with the 
value set from the copy of the job put by the first instance(without PIG-2578).

The actual problem was with the hcat code when this jira was filed. It set the 
mapred.output.dir and lot of other properties in front end but not in the 
backened during setStoreLocation. 
http://svn.apache.org/viewvc/incubator/hcatalog/branches/branch-0.4/src/java/org/apache/hcatalog/pig/HCatStorer.java?revision=1325867&view=markup
If it had set the mapred.output.dir in the backend also, it would have worked 
fine. It was later fixed to do so.

> Multiple Store-commands mess up mapred.output.dir.
> --
>
> Key: PIG-2578
> URL: https://issues.apache.org/jira/browse/PIG-2578
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.1, 0.9.2
>Reporter: Mithun Radhakrishnan
>Assignee: Daniel Dai
> Fix For: 0.10.0, 0.11
>
> Attachments: PIG-2578-1.patch
>
>
> When one runs a pig-script with multiple storers, one sees the following:
> 1. When run as a script, Pig launches a single job.
> 2. PigOutputCommitter::setupJob() calls the 
> underlyingOutputCommitter::setupJob(), once for each storer. But the 
> mapred.output.dir is the same for both calls, even though the storers write 
> to different locations. 
> This was originally seen in HCATALOG-276, when HCatalog's end-to-end tests 
> are run against Pig.
> (https://issues.apache.org/jira/browse/HCATALOG-276)
> Sample pig-script (near identical to HCatalog's Pig_Checkin_4 test):
> a = load 'keyvals' using org.apache.hcatalog.pig.HCatLoader();
> split a into b if key<200, c if key >=200;
> store b into 'keyvals_lt200' using org.apache.hcatalog.pig.HCatStorer();
> store c into 'keyvals_ge200' using org.apache.hcatalog.pig.HCatStorer();
> I've suggested a workaround in HCat for the time being, but I think this 
> might be something that needs fixing in Pig.
> Thanks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (PIG-2866) PigServer fails with macros without a script file

2012-08-14 Thread Dmitriy V. Ryaboy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-2866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy resolved PIG-2866.


Resolution: Fixed

> PigServer fails with macros without a script file
> -
>
> Key: PIG-2866
> URL: https://issues.apache.org/jira/browse/PIG-2866
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.10.0
>Reporter: Bill Graham
>Assignee: Bill Graham
>Priority: Minor
> Fix For: 0.11
>
> Attachments: PIG-2866.1.patch, PIG-2866.2.patch, PIG-2866.3.patch, 
> PIG-2866.4.patch
>
>
> Making a call to {{PigServer.registerQuery}} with an {{InputStream}} will 
> fail if the script contains a macro. This is because 
> {{QueryParserDriver.makeMacroDef}} requires a filename. 
> {{QueryParserDriver.makeMacroDef}} should be changed to not assume that the 
> script input is from a file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2484) Fix several e2e test failures/aborts for 23

2012-08-14 Thread Rohini Palaniswamy (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-2484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434441#comment-13434441
 ] 

Rohini Palaniswamy commented on PIG-2484:
-

test/e2e/pig/lib/hadoop-0.23.0-streaming.jar also needs to be checked in. It 
was not part of the patch as it is a binary. Had mentioned it as a additional 
step during checkin. 

> Fix several e2e test failures/aborts for 23
> ---
>
> Key: PIG-2484
> URL: https://issues.apache.org/jira/browse/PIG-2484
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.2, 0.10.0, 0.11
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.10.0, 0.9.3, 0.11
>
> Attachments: PIG-2484-1.patch, PIG-2484-2.patch, PIG-2484-3.patch, 
> PIG-2484-4-branch0.9.patch
>
>
> There are still a couple of e2e test aborts/failures for hadoop23. Most of 
> them are due to test infrastructure, minor backward incompatibility change in 
> 23, or recent changes in Pig. Here is a list:
> Scripting_1/Scripting_2: MAPREDUCE-3700
> Native_3: 23 test need a hadoop23-steaming.jar
> MonitoredUDF_1: Seems related to guava upgrade (PIG-2460), Pig's guava is 
> newer than hadoop23's
> UdfException_1, UdfException_2, UdfException_3, UdfException_4: Error message 
> change
> Checkin_2, GroupAggFunc_7, GroupAggFunc_9, GroupAggFunc_12, GroupAggFunc_13, 
> Types_6, Scalar_1: float precision
> Limit_2: The specific output records change, test infrastructure should allow 
> this

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2821) HBaseStorage should work with secure hbase

2012-08-14 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-2821:


Attachment: PIG-2821-1.patch

Removed reverting of PIG-2578. Will deal with 
issues from PIG-2578 in PIG-2870 or another jira. 

 Changed HBaseStorage to store all hbase properties in UDFContext properties. 
Credentials was getting added to the Job inadvertently when 
PigOutputFormat.checkOutputSpecs called setStoreLocation. So there is a way 
adding credentials works even with PIG-2578 though unexpected. 

> HBaseStorage should work with secure hbase
> --
>
> Key: PIG-2821
> URL: https://issues.apache.org/jira/browse/PIG-2821
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.10.0
>Reporter: Francis Liu
>Assignee: Rohini Palaniswamy
> Fix For: 0.11, 0.10.1
>
> Attachments: PIG-2821-1.patch, PIG-2821-branch10.patch, 
> PIG-2821-trunk.patch
>
>
> HBaseStorage needs to add HBase delegation token to the Job object if hbase 
> security is enabled.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2821) HBaseStorage should work with secure hbase

2012-08-14 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-2821:


Status: Patch Available  (was: Open)

> HBaseStorage should work with secure hbase
> --
>
> Key: PIG-2821
> URL: https://issues.apache.org/jira/browse/PIG-2821
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.10.0
>Reporter: Francis Liu
>Assignee: Rohini Palaniswamy
> Fix For: 0.11, 0.10.1
>
> Attachments: PIG-2821-1.patch, PIG-2821-branch10.patch, 
> PIG-2821-trunk.patch
>
>
> HBaseStorage needs to add HBase delegation token to the Job object if hbase 
> security is enabled.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2662) skew join does not honor its config parameters

2012-08-14 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434294#comment-13434294
 ] 

Thejas M Nair commented on PIG-2662:


Rajesh, 
With the patch, TestPoissonSampleLoader test cases fail. Can you please take a 
look ? 
Please let me know if you need any help with that.


> skew join does not honor its config parameters
> --
>
> Key: PIG-2662
> URL: https://issues.apache.org/jira/browse/PIG-2662
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.2, 0.10.0
>Reporter: Thejas M Nair
> Attachments: PIG-2662-0.9.2.patch, PIG-2662.2.patch
>
>
> Skew join can be configured using pig.sksampler.samplerate and 
> pig.skewedjoin.reduce.memusage. But the section of code the retrieves the 
> config values from properties (PoissonSampleLoader.computeSamples) is not 
> getting called. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2866) PigServer fails with macros without a script file

2012-08-14 Thread Tony Stevenson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-2866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tony Stevenson updated PIG-2866:


Priority: Minor  (was: Major)

> PigServer fails with macros without a script file
> -
>
> Key: PIG-2866
> URL: https://issues.apache.org/jira/browse/PIG-2866
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.10.0
>Reporter: Bill Graham
>Assignee: Bill Graham
>Priority: Minor
> Fix For: 0.11
>
> Attachments: PIG-2866.1.patch, PIG-2866.2.patch, PIG-2866.3.patch, 
> PIG-2866.4.patch
>
>
> Making a call to {{PigServer.registerQuery}} with an {{InputStream}} will 
> fail if the script contains a macro. This is because 
> {{QueryParserDriver.makeMacroDef}} requires a filename. 
> {{QueryParserDriver.makeMacroDef}} should be changed to not assume that the 
> script input is from a file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2791) Pig does not work with ViewFileSystem

2012-08-14 Thread Daryn Sharp (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated PIG-2791:
-

Summary: Pig does not work with ViewFileSystem  (was: Pig does not work 
with Namenode Federation)

Updated summary since the issue is unrelated to federation.

> Pig does not work with ViewFileSystem
> -
>
> Key: PIG-2791
> URL: https://issues.apache.org/jira/browse/PIG-2791
> Project: Pig
>  Issue Type: Bug
>  Components: grunt
>Affects Versions: 0.10.0
> Environment: Pig QE
>Reporter: patrick white
>Assignee: Rohini Palaniswamy
>Priority: Blocker
> Attachments: asf_test_notes.txt, FixMiniCluster-branch10-1.patch, 
> FixMiniCluster-branch10.patch, PIG-2791-0.patch, PIG-2791-1.patch, 
> PIG-2791-2.patch, PIG-2791-3-branch10.patch, PIG-2791-3-trunk.patch, 
> PIG-2791-4-branch10.patch, PIG-2791-4-trunk.patch, PIG-2791-5-trunk.patch
>
>
> The Yahoo Pig QE team ran into a blocking issue when trying to test 
> Client-Side Mount Tables, on a Federated cluster with two NNs, this blocks 
> Pig Testing on Federation. 
> Federation relies strongly on the use of CSMT with viewFS, QE found that in 
> this configuration it is not possible to enter grunt shell because Pig makes 
> a call to getDefaultReplication() on the fs, which is ambiguous over viewFS 
> and causes core to throw a 
> org.apache.hadoop.fs.viewfs.NotInMountpointException: "getDefaultReplication 
> on empty path is invalid".
> This in turn cause Pig to exit with an internal error as follows:
> 2012-07-06 22:20:25,657 [main] INFO  org.apache.pig.Main - Apache Pig version 
> 0.10.1.0.1206081058 (r1348169) compiled Jun 08 2012, 17:58:42
> 2012-07-06 22:20:26,074 [main] WARN  org.apache.hadoop.conf.Configuration - 
> mapred.used.genericoptionsparser is deprecated. Instead, use 
> mapreduce.client.genericoptionsparser.used
> 2012-07-06 22:20:26,076 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to hadoop file system at: viewfs:///
> 2012-07-06 22:20:26,080 [main] WARN  org.apache.hadoop.conf.Configuration - 
> fs.default.name is deprecated. Instead, use fs.defaultFS
> 2012-07-06 22:20:26,522 [main] ERROR org.apache.pig.Main - ERROR 2999: 
> Unexpected internal error. getDefaultReplication on empty path is invalid
> 2012-07-06 22:20:26,522 [main] WARN  org.apache.pig.Main - There is no log 
> file to write to.
> 2012-07-06 22:20:26,522 [main] ERROR org.apache.pig.Main - 
> org.apache.hadoop.fs.viewfs.NotInMountpointException: getDefaultReplication 
> on empty path is invalid
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem.getDefaultReplication(ViewFileSystem.java:482)
> at 
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:77)
> at 
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.(HDataStorage.java:58)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:205)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:118)
> at org.apache.pig.impl.PigContext.connect(PigContext.java:208)
> at org.apache.pig.PigServer.(PigServer.java:246)
> at org.apache.pig.PigServer.(PigServer.java:231)
> at org.apache.pig.tools.grunt.Grunt.(Grunt.java:47)
> at org.apache.pig.Main.run(Main.java:487)
> at org.apache.pig.Main.main(Main.java:111)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2875) Add recursive record support to AvroStorage

2012-08-14 Thread Cheolsoo Park (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-2875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-2875:
---

Attachment: avro_test_files.tar.gz
PIG-2869.patch

Review board:
https://reviews.apache.org/r/6536/

> Add recursive record support to AvroStorage
> ---
>
> Key: PIG-2875
> URL: https://issues.apache.org/jira/browse/PIG-2875
> Project: Pig
>  Issue Type: Improvement
>  Components: piggybank
>Affects Versions: 0.10.0
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
> Attachments: avro_test_files.tar.gz, PIG-2869.patch
>
>
> Currently, AvroStorage does not allow recursive records in Avro schema 
> because it is not possible to define Pig schema for recursive records. (i.e. 
> records that have self-referencing fields cause an infinite loop, so they are 
> not supported.)
> Even though there is no natural way of handling recursive records in Pig 
> schema, I'd like to propose the following workaround: mapping recursive 
> records to bytearray.
> Take for example the following Avro schema:
> {code}
> {
>   "type" : "record",
>   "name" : "RECURSIVE_RECORD",
>   "fields" : [ {
> "name" : "value",
> "type" : [ "null", "int" ]
>   }, {
> "name" : "next",
> "type" : [ "null", "RECURSIVE_RECORD" ]
>   } ]
> }
> {code}
> and the following data:
> {code}
> {"value":1,"next":{"RECURSIVE_RECORD":{"value":2,"next":{"RECURSIVE_RECORD":{"value":3,"next":null}
>  
> {"value":2,"next":{"RECURSIVE_RECORD":{"value":3,"next":null}}} 
> {"value":3,"next":null}
> {code}
> Then, we can define Pig schema as follows:
> {code}
> {value: int,next: bytearray}
> {code}
> Even though Pig thinks that the "next" fields are bytearray, they're actually 
> loaded as tuples since AvroStorage uses Avro schema when loading files.
> {code}
> grunt> in = LOAD 'test_recursive_schema.avro' USING 
> org.apache.pig.piggybank.storage.avro.AvroStorage ();
> grunt> dump in;
> (1,(2,(3,)))
> (2,(3,))
> (3,)
> {code}
> At this point, we have discrepancy between Avro schema and Pig schema; 
> nevertheless, we can still refer to each field of tuples as follows:
> {code}
> grunt> first = FOREACH in GENERATE $0;
> grunt> dump first;
> (1)
> (2)
> (3)
> or
> grunt> second = FOREACH in GENERATE $1.$0;
> grunt> dump second;
> (2)
> (3)
> ()
> {code}
> Lastly, we can store these tuples as Avro files by specifying schema. Since 
> we can no longer construct Avro schema from Pig schema, it is required for 
> the user to provide Avro schema via the 'schema' parameter in STORE function.
> {code}
> grunt> STORE first INTO 'output' USING 
> org.apache.pig.piggybank.storage.avro.AvroStorage ( 'schema', '[ "null", 
> "int" ]' );
> or
> grunt> STORE in INTO 'output' USING 
> org.apache.pig.piggybank.storage.avro.AvroStorage ( 'schema', '
> {
>   "type" : "record",
>   "name" : "recursive_schema",
>   "fields" : [ { 
> "name" : "value",
> "type" : [ "null", "int" ]
>   }, {
> "name" : "next",
> "type" : [ "null", "recursive_schema" ]
>   } ] 
> }
> ' );
> {code}
> To implement this workaround, the following work is required:
> - Update the current generic union check so that it can handle recursive 
> records. Currently, AvroStorage checks if the Avro schema contains 1) 
> recursive records and 2) generic unions, and fails if so. But since I am 
> going to remove the 1st check, the 2nd check should be able to handle 
> recursive records without stack overflow.
> - Update AvroSchema2Pig so that recursive records can be detected and mapped 
> to bytearrays in Pig schema.
> - Add the 'no_schema_check' parameter to STORE function so that results can 
> be stored even though there exists discrepancy between Avro schema and Pig 
> schema. Since Avro schema for STORE function cannot be constructed from Pig 
> schema, it has to be specified by the user via the 'schema' parameter, and 
> schema check has to be disabled by 'no_schema_check'.
> - Update AvroStorage wiki.
> - Add unit tests.
> I do not think that any incompatibility issues will be introduced by this.
> P.S. The reason why I chose to map recursive records to bytearray instead of 
> empty tuple is because I cannot refer to any field if I use empty tuple. For 
> example, if Pig schema is defined as follows:
> {code}
> {value: int,next: ()}
> {code}
> I get an exception when I attempt to refer to any field in loaded tuples 
> since their schema is not defined (i.e. empty tuple).
> {code}
> ERROR 1127: Index 0 out of range in schema
> {code}
> This is all what I found by trials and errors, so there might be something 
> that I am missing here. If so, please let me know.
> Thanks!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apach

[jira] [Updated] (PIG-2875) Add recursive record support to AvroStorage

2012-08-14 Thread Cheolsoo Park (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-2875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-2875:
---

Status: Patch Available  (was: Open)

This was originally PIG-2869 and re-created due to INFRA-5131.

> Add recursive record support to AvroStorage
> ---
>
> Key: PIG-2875
> URL: https://issues.apache.org/jira/browse/PIG-2875
> Project: Pig
>  Issue Type: Improvement
>  Components: piggybank
>Affects Versions: 0.10.0
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
> Attachments: avro_test_files.tar.gz, PIG-2869.patch
>
>
> Currently, AvroStorage does not allow recursive records in Avro schema 
> because it is not possible to define Pig schema for recursive records. (i.e. 
> records that have self-referencing fields cause an infinite loop, so they are 
> not supported.)
> Even though there is no natural way of handling recursive records in Pig 
> schema, I'd like to propose the following workaround: mapping recursive 
> records to bytearray.
> Take for example the following Avro schema:
> {code}
> {
>   "type" : "record",
>   "name" : "RECURSIVE_RECORD",
>   "fields" : [ {
> "name" : "value",
> "type" : [ "null", "int" ]
>   }, {
> "name" : "next",
> "type" : [ "null", "RECURSIVE_RECORD" ]
>   } ]
> }
> {code}
> and the following data:
> {code}
> {"value":1,"next":{"RECURSIVE_RECORD":{"value":2,"next":{"RECURSIVE_RECORD":{"value":3,"next":null}
>  
> {"value":2,"next":{"RECURSIVE_RECORD":{"value":3,"next":null}}} 
> {"value":3,"next":null}
> {code}
> Then, we can define Pig schema as follows:
> {code}
> {value: int,next: bytearray}
> {code}
> Even though Pig thinks that the "next" fields are bytearray, they're actually 
> loaded as tuples since AvroStorage uses Avro schema when loading files.
> {code}
> grunt> in = LOAD 'test_recursive_schema.avro' USING 
> org.apache.pig.piggybank.storage.avro.AvroStorage ();
> grunt> dump in;
> (1,(2,(3,)))
> (2,(3,))
> (3,)
> {code}
> At this point, we have discrepancy between Avro schema and Pig schema; 
> nevertheless, we can still refer to each field of tuples as follows:
> {code}
> grunt> first = FOREACH in GENERATE $0;
> grunt> dump first;
> (1)
> (2)
> (3)
> or
> grunt> second = FOREACH in GENERATE $1.$0;
> grunt> dump second;
> (2)
> (3)
> ()
> {code}
> Lastly, we can store these tuples as Avro files by specifying schema. Since 
> we can no longer construct Avro schema from Pig schema, it is required for 
> the user to provide Avro schema via the 'schema' parameter in STORE function.
> {code}
> grunt> STORE first INTO 'output' USING 
> org.apache.pig.piggybank.storage.avro.AvroStorage ( 'schema', '[ "null", 
> "int" ]' );
> or
> grunt> STORE in INTO 'output' USING 
> org.apache.pig.piggybank.storage.avro.AvroStorage ( 'schema', '
> {
>   "type" : "record",
>   "name" : "recursive_schema",
>   "fields" : [ { 
> "name" : "value",
> "type" : [ "null", "int" ]
>   }, {
> "name" : "next",
> "type" : [ "null", "recursive_schema" ]
>   } ] 
> }
> ' );
> {code}
> To implement this workaround, the following work is required:
> - Update the current generic union check so that it can handle recursive 
> records. Currently, AvroStorage checks if the Avro schema contains 1) 
> recursive records and 2) generic unions, and fails if so. But since I am 
> going to remove the 1st check, the 2nd check should be able to handle 
> recursive records without stack overflow.
> - Update AvroSchema2Pig so that recursive records can be detected and mapped 
> to bytearrays in Pig schema.
> - Add the 'no_schema_check' parameter to STORE function so that results can 
> be stored even though there exists discrepancy between Avro schema and Pig 
> schema. Since Avro schema for STORE function cannot be constructed from Pig 
> schema, it has to be specified by the user via the 'schema' parameter, and 
> schema check has to be disabled by 'no_schema_check'.
> - Update AvroStorage wiki.
> - Add unit tests.
> I do not think that any incompatibility issues will be introduced by this.
> P.S. The reason why I chose to map recursive records to bytearray instead of 
> empty tuple is because I cannot refer to any field if I use empty tuple. For 
> example, if Pig schema is defined as follows:
> {code}
> {value: int,next: ()}
> {code}
> I get an exception when I attempt to refer to any field in loaded tuples 
> since their schema is not defined (i.e. empty tuple).
> {code}
> ERROR 1127: Index 0 out of range in schema
> {code}
> This is all what I found by trials and errors, so there might be something 
> that I am missing here. If so, please let me know.
> Thanks!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secu

Re: Review Request: PIG-2875 Add recursive record support to AvroStorage

2012-08-14 Thread Cheolsoo Park

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6536/
---

(Updated Aug. 14, 2012, 10:23 a.m.)

Review request for pig.

Summary (updated)
-

PIG-2875 Add recursive record support to AvroStorage

Description
---

Allow recursive records to be loaded/stored by AvroStorage.

The changes include:

1) Remove the recursive record check from AvroSchema2Pig.
2) Modofy inconvert() in AvroSchema2Pig so that it can map recursive records to
bytearrays.
3) Modify containsGenericUnion() in AvroStorageUtils so that it can handle Avro
schema that contains recursive records.
4) Update the parameter parsing in AvroStorage so that 'no_schema_check' can be
passed to both LoadFunc and StoreFunc.
5) Add the recursive record check to AvroSchemaManager. This is needed because
'schema_file' and 'data' cannot refer to avro schema that contains recursive
records.

AvroStorage works as follows:

1) PigSchema maps recursive records to bytearrays, so there is discrepancy
between Avro schema and Pig schema.
2) Recursive records are loaded as tuples even though Pig schema defines them
as bytearrays and can be referred to by position (e.g. $0, $1.$0, etc).
3) To store recursive records, Avro schema must be provided via the 'schema' or
'same' parameter in StoreFunc. In addition, 'no_schema_check' must be enabled
because otherwise schema check will fail due to discrepancy between Avro schema
and Pig schema.
4) Avro schema cannot be specified by the 'data' or 'schema_file' parameter.
This is because AvroSchemaManager cannot handle recursive records for now. The
recursive record check is added to AvroSchemaManager, so if Avro schema that
contains recursive records is specified by these parameters, an exception is
thrown.

This addresses bug PIG-2875.
https://issues.apache.org/jira/browse/PIG-2875

Diffs
-

contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroSchema2Pig.java
6b1d2a1

contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroSchemaManager.java
1939d3e

contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java
c9f7d81

contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java
e24b495

contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
2fab3f7

contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java
040234f

contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/test_generic_union_schema.avro
4e23e73

contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/test_recursive_schema.avro
12a36f8

Diff: https://reviews.apache.org/r/6536/diff/

Testing
---

New test cases are added as follows:

1) Load/store Avro files that contain recursive records in array, map, union,
and another record.
2) Load Avro files that contains recursive records, generate new relations,
apply filters, and store them as non-recursive records.
3) Tests for the StoreFunc parameters: no_schema_check, schema, same,
schema_file, and data.

Thanks,

Cheolsoo Park

[jira] [Created] (PIG-2875) Add recursive record support to AvroStorage

2012-08-14 Thread Cheolsoo Park (JIRA)

Cheolsoo Park created PIG-2875:
--

 Summary: Add recursive record support to AvroStorage
 Key: PIG-2875
 URL: https://issues.apache.org/jira/browse/PIG-2875
 Project: Pig
  Issue Type: Improvement
  Components: piggybank
Affects Versions: 0.10.0
Reporter: Cheolsoo Park
Assignee: Cheolsoo Park


Currently, AvroStorage does not allow recursive records in Avro schema because 
it is not possible to define Pig schema for recursive records. (i.e. records 
that have self-referencing fields cause an infinite loop, so they are not 
supported.)

Even though there is no natural way of handling recursive records in Pig 
schema, I'd like to propose the following workaround: mapping recursive records 
to bytearray.

Take for example the following Avro schema:
{code}
{
  "type" : "record",
  "name" : "RECURSIVE_RECORD",
  "fields" : [ {
"name" : "value",
"type" : [ "null", "int" ]
  }, {
"name" : "next",
"type" : [ "null", "RECURSIVE_RECORD" ]
  } ]
}
{code}

and the following data:

{code}
{"value":1,"next":{"RECURSIVE_RECORD":{"value":2,"next":{"RECURSIVE_RECORD":{"value":3,"next":null}
 
{"value":2,"next":{"RECURSIVE_RECORD":{"value":3,"next":null}}} 
{"value":3,"next":null}
{code}

Then, we can define Pig schema as follows:

{code}
{value: int,next: bytearray}
{code}

Even though Pig thinks that the "next" fields are bytearray, they're actually 
loaded as tuples since AvroStorage uses Avro schema when loading files.

{code}
grunt> in = LOAD 'test_recursive_schema.avro' USING 
org.apache.pig.piggybank.storage.avro.AvroStorage ();
grunt> dump in;
(1,(2,(3,)))
(2,(3,))
(3,)
{code}

At this point, we have discrepancy between Avro schema and Pig schema; 
nevertheless, we can still refer to each field of tuples as follows:

{code}
grunt> first = FOREACH in GENERATE $0;
grunt> dump first;
(1)
(2)
(3)

or

grunt> second = FOREACH in GENERATE $1.$0;
grunt> dump second;
(2)
(3)
()
{code}

Lastly, we can store these tuples as Avro files by specifying schema. Since we 
can no longer construct Avro schema from Pig schema, it is required for the 
user to provide Avro schema via the 'schema' parameter in STORE function.

{code}
grunt> STORE first INTO 'output' USING 
org.apache.pig.piggybank.storage.avro.AvroStorage ( 'schema', '[ "null", "int" 
]' );

or

grunt> STORE in INTO 'output' USING 
org.apache.pig.piggybank.storage.avro.AvroStorage ( 'schema', '
{
  "type" : "record",
  "name" : "recursive_schema",
  "fields" : [ { 
"name" : "value",
"type" : [ "null", "int" ]
  }, {
"name" : "next",
"type" : [ "null", "recursive_schema" ]
  } ] 
}
' );
{code}

To implement this workaround, the following work is required:
- Update the current generic union check so that it can handle recursive 
records. Currently, AvroStorage checks if the Avro schema contains 1) recursive 
records and 2) generic unions, and fails if so. But since I am going to remove 
the 1st check, the 2nd check should be able to handle recursive records without 
stack overflow.
- Update AvroSchema2Pig so that recursive records can be detected and mapped to 
bytearrays in Pig schema.
- Add the 'no_schema_check' parameter to STORE function so that results can be 
stored even though there exists discrepancy between Avro schema and Pig schema. 
Since Avro schema for STORE function cannot be constructed from Pig schema, it 
has to be specified by the user via the 'schema' parameter, and schema check 
has to be disabled by 'no_schema_check'.
- Update AvroStorage wiki.
- Add unit tests.

I do not think that any incompatibility issues will be introduced by this.

P.S. The reason why I chose to map recursive records to bytearray instead of 
empty tuple is because I cannot refer to any field if I use empty tuple. For 
example, if Pig schema is defined as follows:

{code}
{value: int,next: ()}
{code}

I get an exception when I attempt to refer to any field in loaded tuples since 
their schema is not defined (i.e. empty tuple).

{code}
ERROR 1127: Index 0 out of range in schema
{code}

This is all what I found by trials and errors, so there might be something that 
I am missing here. If so, please let me know.

Thanks!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Review Request: PIG-2869 Add recursive record support to AvroStorage

2012-08-14 Thread Cheolsoo Park

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6536/
---

(Updated Aug. 14, 2012, 10:22 a.m.)