[jira] [Resolved] (PIG-4870) Enable MergeJoin testcase in TestCollectedGroup for spark engine

2016-09-06 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang resolved PIG-4870.
--
Resolution: Fixed

Committed to Spark branch. Thanks, Xianda!

> Enable MergeJoin testcase in TestCollectedGroup for spark engine
> 
>
> Key: PIG-4870
> URL: https://issues.apache.org/jira/browse/PIG-4870
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Xianda Ke
>Assignee: Xianda Ke
> Fix For: spark-branch
>
> Attachments: PIG-4870.patch
>
>
> TestCollectedGroup.testMapsideGroupWithMergeJoin was disabled( PIG-4781).
> When MergeJoin (PIG-4810) is ready,  we can enable the UT case for spark 
> engine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4870) Enable MergeJoin testcase in TestCollectedGroup for spark engine

2016-09-06 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15469357#comment-15469357
 ] 

liyunzhang_intel commented on PIG-4870:
---

[~kexianda]: +1, [~xuefuz]: please help checkin this patch

> Enable MergeJoin testcase in TestCollectedGroup for spark engine
> 
>
> Key: PIG-4870
> URL: https://issues.apache.org/jira/browse/PIG-4870
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Xianda Ke
>Assignee: Xianda Ke
> Fix For: spark-branch
>
> Attachments: PIG-4870.patch
>
>
> TestCollectedGroup.testMapsideGroupWithMergeJoin was disabled( PIG-4781).
> When MergeJoin (PIG-4810) is ready,  we can enable the UT case for spark 
> engine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PIG-4870) Enable MergeJoin testcase in TestCollectedGroup for spark engine

2016-09-06 Thread Xianda Ke (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15469348#comment-15469348
 ] 

Xianda Ke edited comment on PIG-4870 at 9/7/16 3:10 AM:


Since merge join optimization is ready, just enable 
TestCollectedGroup.testMapsideGroupWithMergeJoin in spark mode.

Only remove this line, and reformat 
```
if(!Util.isSparkExecType(cluster.getExecType())
```
[~kellyzly], please help review.


was (Author: kexianda):
Since merge join optimization is ready, just enable 
TestCollectedGroup.testMapsideGroupWithMergeJoin in spark mode.

> Enable MergeJoin testcase in TestCollectedGroup for spark engine
> 
>
> Key: PIG-4870
> URL: https://issues.apache.org/jira/browse/PIG-4870
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Xianda Ke
>Assignee: Xianda Ke
> Fix For: spark-branch
>
> Attachments: PIG-4870.patch
>
>
> TestCollectedGroup.testMapsideGroupWithMergeJoin was disabled( PIG-4781).
> When MergeJoin (PIG-4810) is ready,  we can enable the UT case for spark 
> engine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4870) Enable MergeJoin testcase in TestCollectedGroup for spark engine

2016-09-06 Thread Xianda Ke (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianda Ke updated PIG-4870:
---
Attachment: PIG-4870.patch

Since merge join optimization is ready, just enable 
TestCollectedGroup.testMapsideGroupWithMergeJoin in spark mode.

> Enable MergeJoin testcase in TestCollectedGroup for spark engine
> 
>
> Key: PIG-4870
> URL: https://issues.apache.org/jira/browse/PIG-4870
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Xianda Ke
>Assignee: Xianda Ke
> Fix For: spark-branch
>
> Attachments: PIG-4870.patch
>
>
> TestCollectedGroup.testMapsideGroupWithMergeJoin was disabled( PIG-4781).
> When MergeJoin (PIG-4810) is ready,  we can enable the UT case for spark 
> engine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-5024) add a physical operator to broadcast small RDDs

2016-09-06 Thread Xianda Ke (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-5024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianda Ke updated PIG-5024:
---
Description: 
Currently, when optimize some kinds of JOIN, the indexed or sampling files are 
saved into HDFS. By setting the replication to a larger number, it serves as 
distributed cache.

Spark's broadcast mechanism is suitable for this. It seems that we can add a 
physical operator to broadcast small RDDs.
This will benefit the optimization of some specialized Joins, such as Skewed 
Join, Replicated Join and so on. 



  was:
Currently, when optimize some kinds of JOIN, the indexed or sampling files are 
saved into HDFS. By setting the replication to a larger number, it serves as 
cache.

It seems that we can add a physical operator to broadcast small RDDs.
This will benefit some specialized Joins, such as Skewed Join, Replicated Join 
and so on. 




> add a physical operator to broadcast small RDDs
> ---
>
> Key: PIG-5024
> URL: https://issues.apache.org/jira/browse/PIG-5024
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Xianda Ke
>Assignee: Xianda Ke
> Fix For: spark-branch
>
>
> Currently, when optimize some kinds of JOIN, the indexed or sampling files 
> are saved into HDFS. By setting the replication to a larger number, it serves 
> as distributed cache.
> Spark's broadcast mechanism is suitable for this. It seems that we can add a 
> physical operator to broadcast small RDDs.
> This will benefit the optimization of some specialized Joins, such as Skewed 
> Join, Replicated Join and so on. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4897) Scope of param substitution for run/exec commands

2016-09-06 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15468798#comment-15468798
 ] 

Rohini Palaniswamy commented on PIG-4897:
-

I think we should only scope for exec and not run. For run we should overwrite 
the passed parameter as declare inside run should be available outside it as 
per definition from Pig book. 

{code}
exec [-param param_name = param_value] [-param_file filename] [script] Execute 
the Pig Latin script script. Aliases defined in script are not imported into 
Grunt. This command is useful for testing your Pig Latin scripts while inside a 
Grunt session. You can also run exec without paramter to only run the Pig 
statements before exec. The difference is Pig will not combine them with the 
rest of the script in execution.
run [-param param_name = param_value] [-param_file filename] script Execute the 
Pig Latin script script in the current Grunt shell. Thus all aliases referenced 
in script are available to Grunt, and the commands in script are accessible via 
the shell history. This is another option for testing Pig Latin scripts while 
inside a Grunt session.
{code}

> Scope of param substitution for run/exec commands
> -
>
> Key: PIG-4897
> URL: https://issues.apache.org/jira/browse/PIG-4897
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
> Attachments: pig-4897-v01-notestyet.patch, pig-4897-v02.patch, 
> pig-4897-v03.patch, pig-4897-v04.patch, pig-4897-v05.patch
>
>
> After PIG-3359, pig param substitution became global in that parameter 
> declared in the pig script called from {{run}} or {{exec}} would live after 
> that script finishes.  
> This created an interesting situation.
> {code:title=test1.pig}
> exec -param output=/tmp/deleteme111 test1_1.pig
> exec -param output=/tmp/deleteme222 test1_1.pig
> {code}
> {code:title=test1_1.pig}
> %default myout '$output.out';
> A = load 'input.txt' as (a0:int);
> store A into '$myout';
> {code}
> Running {{test1.pig}} would try to run two jobs that both tries to write to 
> /tmp/deleteme111 and fail.  (Second param output=/tmp/deleteme222 is ignored.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: Pig-trunk-commit #2370

2016-09-06 Thread Apache Jenkins Server
See 

Changes:

[knoguchi] PIG-5023: Documentation for BagToTuple (icook via knoguchi)

--
[...truncated 3000 lines...]
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.regex...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.plans...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.util...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.executionengine.tez...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.executionengine.tez.plan...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.executionengine.tez.plan.operator...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.executionengine.tez.plan.optimizer...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.executionengine.tez.plan.udf...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.executionengine.tez.runtime...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.executionengine.tez.util...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.executionengine.util...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.hbase...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.streaming...
  [javadoc] Loading source files for package org.apache.pig.builtin...
  [javadoc] Loading source files for package org.apache.pig.builtin.mock...
  [javadoc] Loading source files for package org.apache.pig.classification...
  [javadoc] Loading source files for package org.apache.pig.data...
  [javadoc] Loading source files for package org.apache.pig.data.utils...
  [javadoc] Loading source files for package org.apache.pig.impl...
  [javadoc] Loading source files for package org.apache.pig.impl.builtin...
  [javadoc] Loading source files for package org.apache.pig.impl.io...
  [javadoc] Loading source files for package org.apache.pig.impl.io.compress...
  [javadoc] Loading source files for package org.apache.pig.impl.logicalLayer...
  [javadoc] Loading source files for package 
org.apache.pig.impl.logicalLayer.schema...
  [javadoc] Loading source files for package 
org.apache.pig.impl.logicalLayer.validators...
  [javadoc] Loading source files for package org.apache.pig.impl.plan...
  [javadoc] Loading source files for package 
org.apache.pig.impl.plan.optimizer...
  [javadoc] Loading source files for package org.apache.pig.impl.streaming...
  [javadoc] Loading source files for package org.apache.pig.impl.util...
  [javadoc] Loading source files for package org.apache.pig.impl.util.avro...
  [javadoc] Loading source files for package org.apache.pig.impl.util.hive...
  [javadoc] Loading source files for package org.apache.pig.newplan...
  [javadoc] Loading source files for package org.apache.pig.newplan.logical...
  [javadoc] Loading source files for package 
org.apache.pig.newplan.logical.expression...
  [javadoc] Loading source files for package 
org.apache.pig.newplan.logical.optimizer...
  [javadoc] Loading source files for package 
org.apache.pig.newplan.logical.relational...
  [javadoc] Loading source files for package 
org.apache.pig.newplan.logical.rules...
  [javadoc] Loading source files for package 
org.apache.pig.newplan.logical.visitor...
  [javadoc] Loading source files for package org.apache.pig.newplan.optimizer...
  [javadoc] Loading source files for package org.apache.pig.parser...
  [javadoc] Loading source files for package org.apache.pig.pen...
  [javadoc] Loading source files for package org.apache.pig.pen.util...
  [javadoc] Loading source files for package org.apache.pig.scripting...
  [javadoc] Loading source files for package org.apache.pig.scripting.groovy...
  [javadoc] Loading source files for package org.apache.pig.scripting.jruby...
  [javadoc] Loading source files for package org.apache.pig.scripting.js...
  [javadoc] Loading source files for package org.apache.pig.scripting.jython...
  [javadoc] Loading source files for package 
org.apache.pig.scripting.streaming.python...
  [javadoc] Loading source files for package org.apache.pig.tools...
  [javadoc] Loading source files for package org.apache.pig.tools.cmdline...
  [javadoc] Loading source files for package org.apache.pig.tools.counters...
  [javadoc] Loading source files for package org.apache.pig.tools.grunt...
  [javadoc] Loading source files for package org.apache.pig.tools.parameters...
  [javadoc] Loading source files for package org.apache.pig.tools.pigstats...
  [javadoc] Loading source files for package 

[jira] [Resolved] (PIG-5023) Documentation for BagToTuple

2016-09-06 Thread Koji Noguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-5023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi resolved PIG-5023.
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 0.17.0

Committed to trunk.

Thanks for your patch, Ian! 

> Documentation for BagToTuple
> 
>
> Key: PIG-5023
> URL: https://issues.apache.org/jira/browse/PIG-5023
> Project: Pig
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Ian Cook
>Assignee: Ian Cook
> Fix For: 0.17.0
>
> Attachments: PIG-5023.patch
>
>
> {{BagToTuple}} was not documented in Built In Functions. Patch with 
> documentation is attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-5023) Documentation for BagToTuple

2016-09-06 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15468487#comment-15468487
 ] 

Koji Noguchi commented on PIG-5023:
---

+1.  Tried the sample code and confirmed the results.  Committing shortly.

> Documentation for BagToTuple
> 
>
> Key: PIG-5023
> URL: https://issues.apache.org/jira/browse/PIG-5023
> Project: Pig
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Ian Cook
>Assignee: Ian Cook
> Attachments: PIG-5023.patch
>
>
> {{BagToTuple}} was not documented in Built In Functions. Patch with 
> documentation is attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PIG-5024) add a physical operator to broadcast small RDDs

2016-09-06 Thread Xianda Ke (JIRA)
Xianda Ke created PIG-5024:
--

 Summary: add a physical operator to broadcast small RDDs
 Key: PIG-5024
 URL: https://issues.apache.org/jira/browse/PIG-5024
 Project: Pig
  Issue Type: Sub-task
Reporter: Xianda Ke
Assignee: Xianda Ke


Currently, when optimize some kinds of JOIN, the indexed or sampling files are 
saved into HDFS. By setting the replication to a larger number, it serves as 
cache.

It seems that we can add a physical operator to broadcast small RDDs.
This will benefit some specialized Joins, such as Skewed Join, Replicated Join 
and so on. 





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] Subscription: PIG patch available

2016-09-06 Thread jira
Issue Subscription
Filter: PIG patch available (27 issues)

Subscriber: pigdaily

Key Summary
PIG-4926Modify the content of start.xml for spark mode
https://issues-test.apache.org/jira/browse/PIG-4926
PIG-4922Deadlock between SpillableMemoryManager and 
InternalSortedBag$SortedDataBagIterator
https://issues-test.apache.org/jira/browse/PIG-4922
PIG-4918Pig on Tez cannot switch pig.temp.dir to another fs
https://issues-test.apache.org/jira/browse/PIG-4918
PIG-4897Scope of param substitution for run/exec commands
https://issues-test.apache.org/jira/browse/PIG-4897
PIG-4886Add PigSplit#getLocationInfo to fix the NPE found in log in spark 
mode
https://issues-test.apache.org/jira/browse/PIG-4886
PIG-4854Merge spark branch to trunk
https://issues-test.apache.org/jira/browse/PIG-4854
PIG-4849pig on tez will cause tez-ui to crash,because the content from 
timeline server is too long. 
https://issues-test.apache.org/jira/browse/PIG-4849
PIG-4788the value BytesRead metric info always returns 0 even the length of 
input file is not 0 in spark engine
https://issues-test.apache.org/jira/browse/PIG-4788
PIG-4745DataBag should protect content of passed list of tuples
https://issues-test.apache.org/jira/browse/PIG-4745
PIG-4684Exception should be changed to warning when job diagnostics cannot 
be fetched
https://issues-test.apache.org/jira/browse/PIG-4684
PIG-4656Improve String serialization and comparator performance in 
BinInterSedes
https://issues-test.apache.org/jira/browse/PIG-4656
PIG-4598Allow user defined plan optimizer rules
https://issues-test.apache.org/jira/browse/PIG-4598
PIG-4551Partition filter is not pushed down in case of SPLIT
https://issues-test.apache.org/jira/browse/PIG-4551
PIG-4539New PigUnit
https://issues-test.apache.org/jira/browse/PIG-4539
PIG-4515org.apache.pig.builtin.Distinct throws ClassCastException
https://issues-test.apache.org/jira/browse/PIG-4515
PIG-4323PackageConverter hanging in Spark
https://issues-test.apache.org/jira/browse/PIG-4323
PIG-4313StackOverflowError in LIMIT operation on Spark
https://issues-test.apache.org/jira/browse/PIG-4313
PIG-4251Pig on Storm
https://issues-test.apache.org/jira/browse/PIG-4251
PIG-4002Disable combiner when map-side aggregation is used
https://issues-test.apache.org/jira/browse/PIG-4002
PIG-3952PigStorage accepts '-tagSplit' to return full split information
https://issues-test.apache.org/jira/browse/PIG-3952
PIG-3911Define unique fields with @OutputSchema
https://issues-test.apache.org/jira/browse/PIG-3911
PIG-3877Getting Geo Latitude/Longitude from Address Lines
https://issues-test.apache.org/jira/browse/PIG-3877
PIG-3873Geo distance calculation using Haversine
https://issues-test.apache.org/jira/browse/PIG-3873
PIG-3864ToDate(userstring, format, timezone) computes DateTime with strange 
handling of Daylight Saving Time with location based timezones
https://issues-test.apache.org/jira/browse/PIG-3864
PIG-3851Upgrade jline to 2.11
https://issues-test.apache.org/jira/browse/PIG-3851
PIG-3668COR built-in function when atleast one of the coefficient values is 
NaN
https://issues-test.apache.org/jira/browse/PIG-3668
PIG-3587add functionality for rolling over dates
https://issues-test.apache.org/jira/browse/PIG-3587

You may edit this subscription at:
https://issues-test.apache.org/jira/secure/FilterSubscription!default.jspa?subId=16328=12322384


[jira] Subscription: PIG patch available

2016-09-06 Thread jira
Issue Subscription
Filter: PIG patch available (27 issues)

Subscriber: pigdaily

Key Summary
PIG-4976streaming job with store clause stuck if the script fail
https://issues.apache.org/jira/browse/PIG-4976
PIG-4926Modify the content of start.xml for spark mode
https://issues.apache.org/jira/browse/PIG-4926
PIG-4922Deadlock between SpillableMemoryManager and 
InternalSortedBag$SortedDataBagIterator
https://issues.apache.org/jira/browse/PIG-4922
PIG-4918Pig on Tez cannot switch pig.temp.dir to another fs
https://issues.apache.org/jira/browse/PIG-4918
PIG-4897Scope of param substitution for run/exec commands
https://issues.apache.org/jira/browse/PIG-4897
PIG-4854Merge spark branch to trunk
https://issues.apache.org/jira/browse/PIG-4854
PIG-4849pig on tez will cause tez-ui to crash,because the content from 
timeline server is too long. 
https://issues.apache.org/jira/browse/PIG-4849
PIG-4788the value BytesRead metric info always returns 0 even the length of 
input file is not 0 in spark engine
https://issues.apache.org/jira/browse/PIG-4788
PIG-4745DataBag should protect content of passed list of tuples
https://issues.apache.org/jira/browse/PIG-4745
PIG-4684Exception should be changed to warning when job diagnostics cannot 
be fetched
https://issues.apache.org/jira/browse/PIG-4684
PIG-4656Improve String serialization and comparator performance in 
BinInterSedes
https://issues.apache.org/jira/browse/PIG-4656
PIG-4598Allow user defined plan optimizer rules
https://issues.apache.org/jira/browse/PIG-4598
PIG-4551Partition filter is not pushed down in case of SPLIT
https://issues.apache.org/jira/browse/PIG-4551
PIG-4539New PigUnit
https://issues.apache.org/jira/browse/PIG-4539
PIG-4515org.apache.pig.builtin.Distinct throws ClassCastException
https://issues.apache.org/jira/browse/PIG-4515
PIG-4323PackageConverter hanging in Spark
https://issues.apache.org/jira/browse/PIG-4323
PIG-4313StackOverflowError in LIMIT operation on Spark
https://issues.apache.org/jira/browse/PIG-4313
PIG-4251Pig on Storm
https://issues.apache.org/jira/browse/PIG-4251
PIG-4002Disable combiner when map-side aggregation is used
https://issues.apache.org/jira/browse/PIG-4002
PIG-3952PigStorage accepts '-tagSplit' to return full split information
https://issues.apache.org/jira/browse/PIG-3952
PIG-3911Define unique fields with @OutputSchema
https://issues.apache.org/jira/browse/PIG-3911
PIG-3877Getting Geo Latitude/Longitude from Address Lines
https://issues.apache.org/jira/browse/PIG-3877
PIG-3873Geo distance calculation using Haversine
https://issues.apache.org/jira/browse/PIG-3873
PIG-3864ToDate(userstring, format, timezone) computes DateTime with strange 
handling of Daylight Saving Time with location based timezones
https://issues.apache.org/jira/browse/PIG-3864
PIG-3851Upgrade jline to 2.11
https://issues.apache.org/jira/browse/PIG-3851
PIG-3668COR built-in function when atleast one of the coefficient values is 
NaN
https://issues.apache.org/jira/browse/PIG-3668
PIG-3587add functionality for rolling over dates
https://issues.apache.org/jira/browse/PIG-3587

You may edit this subscription at:
https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=16328=12322384