[jira] Subscription: PIG patch available

2016-02-05 Thread jira
Issue Subscription
Filter: PIG patch available (30 issues)

Subscriber: pigdaily

Key Summary
PIG-4759Fix Classresolution_1 e2e failure
https://issues.apache.org/jira/browse/PIG-4759
PIG-4745DataBag should protect content of passed list of tuples
https://issues.apache.org/jira/browse/PIG-4745
PIG-4734TOMAP schema inferring breaks some scripts in type checking for 
bincond
https://issues.apache.org/jira/browse/PIG-4734
PIG-4690Union with self replicate join will fail in Tez
https://issues.apache.org/jira/browse/PIG-4690
PIG-4686Backend code should not call AvroStorageUtils.getPaths
https://issues.apache.org/jira/browse/PIG-4686
PIG-4684Exception should be changed to warning when job diagnostics cannot 
be fetched
https://issues.apache.org/jira/browse/PIG-4684
PIG-4656Improve String serialization and comparator performance in 
BinInterSedes
https://issues.apache.org/jira/browse/PIG-4656
PIG-4641Print the instance of Object without using toString()
https://issues.apache.org/jira/browse/PIG-4641
PIG-4598Allow user defined plan optimizer rules
https://issues.apache.org/jira/browse/PIG-4598
PIG-4581thread safe issue in NodeIdGenerator
https://issues.apache.org/jira/browse/PIG-4581
PIG-4551Partition filter is not pushed down in case of SPLIT
https://issues.apache.org/jira/browse/PIG-4551
PIG-4539New PigUnit
https://issues.apache.org/jira/browse/PIG-4539
PIG-4515org.apache.pig.builtin.Distinct throws ClassCastException
https://issues.apache.org/jira/browse/PIG-4515
PIG-4455Should use DependencyOrderWalker instead of DepthFirstWalker in 
MRPrinter
https://issues.apache.org/jira/browse/PIG-4455
PIG-4341Add CMX support to pig.tmpfilecompression.codec
https://issues.apache.org/jira/browse/PIG-4341
PIG-4323PackageConverter hanging in Spark
https://issues.apache.org/jira/browse/PIG-4323
PIG-4313StackOverflowError in LIMIT operation on Spark
https://issues.apache.org/jira/browse/PIG-4313
PIG-4281Fix TestFinish for Spark engine
https://issues.apache.org/jira/browse/PIG-4281
PIG-4251Pig on Storm
https://issues.apache.org/jira/browse/PIG-4251
PIG-4111Make Pig compiles with avro-1.7.7
https://issues.apache.org/jira/browse/PIG-4111
PIG-4002Disable combiner when map-side aggregation is used
https://issues.apache.org/jira/browse/PIG-4002
PIG-3952PigStorage accepts '-tagSplit' to return full split information
https://issues.apache.org/jira/browse/PIG-3952
PIG-3911Define unique fields with @OutputSchema
https://issues.apache.org/jira/browse/PIG-3911
PIG-3877Getting Geo Latitude/Longitude from Address Lines
https://issues.apache.org/jira/browse/PIG-3877
PIG-3873Geo distance calculation using Haversine
https://issues.apache.org/jira/browse/PIG-3873
PIG-3866Create ThreadLocal classloader per PigContext
https://issues.apache.org/jira/browse/PIG-3866
PIG-3864ToDate(userstring, format, timezone) computes DateTime with strange 
handling of Daylight Saving Time with location based timezones
https://issues.apache.org/jira/browse/PIG-3864
PIG-3851Upgrade jline to 2.11
https://issues.apache.org/jira/browse/PIG-3851
PIG-3668COR built-in function when atleast one of the coefficient values is 
NaN
https://issues.apache.org/jira/browse/PIG-3668
PIG-3587add functionality for rolling over dates
https://issues.apache.org/jira/browse/PIG-3587

You may edit this subscription at:
https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=16328=12322384


Jenkins build is still unstable: Pig-trunk-commit #2289

2016-02-05 Thread Apache Jenkins Server
See 



[jira] [Updated] (PIG-4243) Fix "TestStore" for Spark engine

2016-02-05 Thread liyunzhang_intel (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel updated PIG-4243:
--
Attachment: PIG-4243.patch

> Fix "TestStore" for Spark engine
> 
>
> Key: PIG-4243
> URL: https://issues.apache.org/jira/browse/PIG-4243
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Fix For: spark-branch
>
> Attachments: PIG-4243.patch, TEST-org.apache.pig.test.TestStore.txt
>
>
> 1. Build spark and pig env according to PIG-4168
> 2. add TestStore to $PIG_HOME/test/spark-tests
> cat  $PIG_HOME/test/spark-tests
> **/TestStore
> 3. run unit test TestStore
> ant test-spark
> 4. the unit test fails
> error log is attached



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4759) Fix Classresolution_1 e2e failure

2016-02-05 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4759:

Assignee: Rohini Palaniswamy
  Status: Patch Available  (was: Open)

> Fix Classresolution_1 e2e failure
> -
>
> Key: PIG-4759
> URL: https://issues.apache.org/jira/browse/PIG-4759
> Project: Pig
>  Issue Type: Bug
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.16.0
>
> Attachments: PIG-4759-1.patch
>
>
>   We had left it as a known issue to be fixed later as that was a very odd 
> and uncommon usage put in just for the particular testcase - store into a 
> file with one StoreFunc, but read back with a different reader in the same 
> script. But came across one of our user doing that same case. Storing bags 
> using PigStorage and reading back with TextLoader and processing them as 
> plain strings later on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4759) Fix Classresolution_1 e2e failure

2016-02-05 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4759:

Attachment: PIG-4759-1.patch

> Fix Classresolution_1 e2e failure
> -
>
> Key: PIG-4759
> URL: https://issues.apache.org/jira/browse/PIG-4759
> Project: Pig
>  Issue Type: Bug
>Reporter: Rohini Palaniswamy
> Fix For: 0.16.0
>
> Attachments: PIG-4759-1.patch
>
>
>   We had left it as a known issue to be fixed later as that was a very odd 
> and uncommon usage put in just for the particular testcase - store into a 
> file with one StoreFunc, but read back with a different reader in the same 
> script. But came across one of our user doing that same case. Storing bags 
> using PigStorage and reading back with TextLoader and processing them as 
> plain strings later on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4243) Fix "TestStore" for Spark engine

2016-02-05 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15133852#comment-15133852
 ] 

liyunzhang_intel commented on PIG-4243:
---

In https://builds.apache.org/job/Pig-spark/298/#showFailuresLink, it shows  
following unit tests fail:
org.apache.pig.test.TestStore.testCleanupOnFailureMultiStore
org.apache.pig.test.TestStore.testCleanupOnFailure

PIG-4243.patch fixes these two failures.

Changes in PIG-4243.patch:
1. add "clean up  for all of the stores"(call PigStorage#cleanupOnFailure) 
2. add some judgements to give different results in different engine 
mode(TestStoreBase#testCleanupOnFailureMultiStore)

Explain more about TestStoreBase#testCleanupOnFailureMultiStore
The script like following:
{code}
A = load xx;
store A into '1.out' using DummyStore('true','1');   -- first job should fail
store A into '2.out' using DummyStore('false','1');  -- second job should 
success
{code}

the spark plan will be after multiquery optimization:
{code}
Split - scope-14
|   |
|   a: Store(hdfs://1.out:myudfs.DummyStore('true','1')) - scope-4
|   |
|   a: Store(hdfs://2.out:myudfs.DummyStore('false','1')) - scope-7
|
|---a: 
Load(hdfs://zly2.sh.intel.com:8020/user/root/multiStore.txt:org.apache.pig.builtin.PigStorage)
 - scope-0--
{code}
  In spark mode ,when there are two POStore in the sub plan of POSplit, once 
the first job fails and throws exception,  the second job will not be executed. 
 FILE_SETUPJOB_CALLED( or  FILE_SETUPTASK_CALLED) of second job will not be 
generated.  *But why FILE_SETUPJOB_CALLED(or FILE_SETUPTASK_CALLED) of second 
job is generated even the second job is also not executed in mr mode?*
in MR mode:
  FILE_SETUPJOB_CALLED is genereated in 
org.apache.pig.test.TestStore.DummyOutputCommitter#setupJob.
  DummyOutputCommitter#setupJob stacktrace:
   {code}
   DummyOutputCommitter.setupJob
 
->org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.setupJob(PigOutputCommitter.java:407)
   -> 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:511)
  {code}
 
  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter#PigOutputCommitter
  {code}
   public PigOutputCommitter(TaskAttemptContext context,
List mapStores, List reduceStores)
throws IOException {
// create and store the map and reduce output committers
mapOutputCommitters = getCommitters(context, mapStores);  // Kelly's 
comment: there will be 2 mapOutputCommitters in above case and later 
DummyOutputCommitter#setupJob will be invoked and  FILE_SETUPJOB_CALLED of 
first store and second store will be generated before the  mr job starts to 
compute.
reduceOutputCommitters = getCommitters(context, reduceStores);
recoverySupported = 
context.getConfiguration().getBoolean(PigConfiguration.PIG_OUTPUT_COMMITTER_RECOVERY,
 false);
}
{code}
   
In spark mode:
 DummyOutputCommitter#setupJob stacktrace
 {code}
  DummyOutputCommitter.setupJob
   
->org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.setupJob(PigOutputCommitter.java:407)
  
->org.apache.spark.rdd.PairRDDFunctions#saveAsNewAPIHadoopDataset
  {code}  
  
In spark mode, 1 store generates 1 spark job and  the 
PigOutputCommitter only has 1 reduceOutputCommitter for the spark job. 
StoreConverter#configureStorer: 
{code}  
  //Kelly's comment:We only set the location of current store as 
JobControlCompiler.PIG_REDUCE_STORES even there are more than 1 POStore in  the 
script. In spark, store is an action, 1 store generates 1 job.  So in above 
case, there will be two jobs and we execute jobs one by one, when first job 
fails and second job will be stopped and 
FILE_SETUPJOB_CALLED(FILE_SETUPTASK_CALLED) of second job is not generated
private static POStore configureStorer(JobConf jobConf,
PhysicalOperator op) throws IOException {
 
jobConf.set(JobControlCompiler.PIG_MAP_STORES,
ObjectSerializer.serialize(Lists.newArrayList()));
jobConf.set(JobControlCompiler.PIG_REDUCE_STORES,
ObjectSerializer.serialize(storeLocations));
 
}
{code}

[~pallavi.rao], [~mohitsabharwal],[~kexianda]: help review PIG-4243.patch, 
thanks


> Fix "TestStore" for Spark engine
> 
>
> Key: PIG-4243
> URL: https://issues.apache.org/jira/browse/PIG-4243
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Fix 

Jenkins build is back to normal : Pig-trunk #1874

2016-02-05 Thread Apache Jenkins Server
See 



[jira] [Updated] (PIG-4281) Fix TestFinish for Spark engine

2016-02-05 Thread liyunzhang_intel (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel updated PIG-4281:
--
Status: Patch Available  (was: Reopened)

> Fix TestFinish for Spark engine
> ---
>
> Key: PIG-4281
> URL: https://issues.apache.org/jira/browse/PIG-4281
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Fix For: spark-branch
>
> Attachments: PIG-4281.patch, TEST-org.apache.pig.test.TestFinish.txt
>
>
> error log is attached



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (PIG-4281) Fix TestFinish for Spark engine

2016-02-05 Thread liyunzhang_intel (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel reassigned PIG-4281:
-

Assignee: liyunzhang_intel  (was: Mohit Sabharwal)

> Fix TestFinish for Spark engine
> ---
>
> Key: PIG-4281
> URL: https://issues.apache.org/jira/browse/PIG-4281
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Fix For: spark-branch
>
> Attachments: PIG-4281.patch, TEST-org.apache.pig.test.TestFinish.txt
>
>
> error log is attached



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4281) Fix TestFinish for Spark engine

2016-02-05 Thread liyunzhang_intel (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel updated PIG-4281:
--
Attachment: (was: PIG-4281.patch)

> Fix TestFinish for Spark engine
> ---
>
> Key: PIG-4281
> URL: https://issues.apache.org/jira/browse/PIG-4281
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Fix For: spark-branch
>
> Attachments: TEST-org.apache.pig.test.TestFinish.txt
>
>
> error log is attached



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4281) Fix TestFinish for Spark engine

2016-02-05 Thread liyunzhang_intel (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel updated PIG-4281:
--
Attachment: PIG-4281.patch

> Fix TestFinish for Spark engine
> ---
>
> Key: PIG-4281
> URL: https://issues.apache.org/jira/browse/PIG-4281
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Fix For: spark-branch
>
> Attachments: PIG-4281.patch, TEST-org.apache.pig.test.TestFinish.txt
>
>
> error log is attached



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4281) Fix TestFinish for Spark engine

2016-02-05 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15135587#comment-15135587
 ] 

liyunzhang_intel commented on PIG-4281:
---

[~pallavi.rao],[~mohitsabharwal],[~kexianda]: help review PIG-4281.patch. thanks

> Fix TestFinish for Spark engine
> ---
>
> Key: PIG-4281
> URL: https://issues.apache.org/jira/browse/PIG-4281
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Fix For: spark-branch
>
> Attachments: PIG-4281.patch, TEST-org.apache.pig.test.TestFinish.txt
>
>
> error log is attached



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4281) Fix TestFinish for Spark engine

2016-02-05 Thread liyunzhang_intel (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel updated PIG-4281:
--
Attachment: PIG-4281.patch

Changes are:
1. call UDFFinishVisitor in JobGraphBuilder#visitSparkOp to execute 
POUserFunc#finish 
2. initialize  PigMapReduce.sJobConfInternal.get in 
TestFinish#testFinishInReduceMR and TestFinish#testFinishInMapMR

> Fix TestFinish for Spark engine
> ---
>
> Key: PIG-4281
> URL: https://issues.apache.org/jira/browse/PIG-4281
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: liyunzhang_intel
>Assignee: Mohit Sabharwal
> Fix For: spark-branch
>
> Attachments: PIG-4281.patch, TEST-org.apache.pig.test.TestFinish.txt
>
>
> error log is attached



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (PIG-4784) Enable "pig.disable.counter“ for spark engine

2016-02-05 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang resolved PIG-4784.
--
Resolution: Fixed

Committed to Spark branch. Thanks, Liyun!

> Enable "pig.disable.counter“ for spark engine
> -
>
> Key: PIG-4784
> URL: https://issues.apache.org/jira/browse/PIG-4784
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Fix For: spark-branch
>
> Attachments: PIG-4784.patch, PIG-4784_2.patch
>
>
> When you enable pig.disable.counter as "true" in the conf/pig.properties, the 
> counter to calculate the number of input records  and output records will be 
> disabled. 
> Following unit tests are designed to test it but now they fail:
> org.apache.pig.test.TestPigRunner#testDisablePigCounters
> org.apache.pig.test.TestPigRunner#testDisablePigCounters2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)