[jira] [Commented] (PIG-4634) Fix records count issues in output statistics

2015-10-30 Thread Xianda Ke (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982091#comment-14982091
 ] 

Xianda Ke commented on PIG-4634:


Hi [~mohitsabharwal], Thank you for your comments. the code readability nits 
are fixed(https://reviews.apache.org/r/37627/diff/5-6/). Thanks a lot!



> Fix records count issues in output statistics
> -
>
> Key: PIG-4634
> URL: https://issues.apache.org/jira/browse/PIG-4634
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Xianda Ke
>Assignee: Xianda Ke
> Fix For: spark-branch
>
> Attachments: PIG-4634-3.patch, PIG-4634-4.patch, PIG-4634-5.patch, 
> PIG-4634-6.patch, PIG-4634.patch, PIG-4634_2.patch
>
>
> Test cases simpleTest() and simpleTest2()  in TestPigRunner failed, caused by 
> following issues:
> 1. pig context in SparkPigStats isn't initialized.
> 2. the records count logic hasn't been implemented.
> 3. getOutpugAlias(), getPigProperties(), getBytesWritten() and 
> getRecordWritten() have not been implemented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4634) Fix records count issues in output statistics

2015-10-30 Thread Xianda Ke (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianda Ke updated PIG-4634:
---
Attachment: PIG-4634-6.patch

> Fix records count issues in output statistics
> -
>
> Key: PIG-4634
> URL: https://issues.apache.org/jira/browse/PIG-4634
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Xianda Ke
>Assignee: Xianda Ke
> Fix For: spark-branch
>
> Attachments: PIG-4634-3.patch, PIG-4634-4.patch, PIG-4634-5.patch, 
> PIG-4634-6.patch, PIG-4634.patch, PIG-4634_2.patch
>
>
> Test cases simpleTest() and simpleTest2()  in TestPigRunner failed, caused by 
> following issues:
> 1. pig context in SparkPigStats isn't initialized.
> 2. the records count logic hasn't been implemented.
> 3. getOutpugAlias(), getPigProperties(), getBytesWritten() and 
> getRecordWritten() have not been implemented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4634) Fix records count issues in output statistics

2015-10-30 Thread Xianda Ke (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982093#comment-14982093
 ] 

Xianda Ke commented on PIG-4634:


the latest patch PIG-4634-6.patch is attached.

> Fix records count issues in output statistics
> -
>
> Key: PIG-4634
> URL: https://issues.apache.org/jira/browse/PIG-4634
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Xianda Ke
>Assignee: Xianda Ke
> Fix For: spark-branch
>
> Attachments: PIG-4634-3.patch, PIG-4634-4.patch, PIG-4634-5.patch, 
> PIG-4634-6.patch, PIG-4634.patch, PIG-4634_2.patch
>
>
> Test cases simpleTest() and simpleTest2()  in TestPigRunner failed, caused by 
> following issues:
> 1. pig context in SparkPigStats isn't initialized.
> 2. the records count logic hasn't been implemented.
> 3. getOutpugAlias(), getPigProperties(), getBytesWritten() and 
> getRecordWritten() have not been implemented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] Subscription: PIG patch available

2015-10-30 Thread jira
Issue Subscription
Filter: PIG patch available (27 issues)

Subscriber: pigdaily

Key Summary
PIG-4684Exception should be changed to warning when job diagnostics cannot 
be fetched
https://issues.apache.org/jira/browse/PIG-4684
PIG-4677Display failure information on stop on failure
https://issues.apache.org/jira/browse/PIG-4677
PIG-4675Multi Store Statement will fail on the second store statement.
https://issues.apache.org/jira/browse/PIG-4675
PIG-4656Improve String serialization and comparator performance in 
BinInterSedes
https://issues.apache.org/jira/browse/PIG-4656
PIG-4641Print the instance of Object without using toString()
https://issues.apache.org/jira/browse/PIG-4641
PIG-4598Allow user defined plan optimizer rules
https://issues.apache.org/jira/browse/PIG-4598
PIG-4581thread safe issue in NodeIdGenerator
https://issues.apache.org/jira/browse/PIG-4581
PIG-4539New PigUnit
https://issues.apache.org/jira/browse/PIG-4539
PIG-4515org.apache.pig.builtin.Distinct throws ClassCastException
https://issues.apache.org/jira/browse/PIG-4515
PIG-4455Should use DependencyOrderWalker instead of DepthFirstWalker in 
MRPrinter
https://issues.apache.org/jira/browse/PIG-4455
PIG-4417Pig's register command should support automatic fetching of jars 
from repo.
https://issues.apache.org/jira/browse/PIG-4417
PIG-4373Implement PIG-3861 in Tez
https://issues.apache.org/jira/browse/PIG-4373
PIG-4341Add CMX support to pig.tmpfilecompression.codec
https://issues.apache.org/jira/browse/PIG-4341
PIG-4323PackageConverter hanging in Spark
https://issues.apache.org/jira/browse/PIG-4323
PIG-4313StackOverflowError in LIMIT operation on Spark
https://issues.apache.org/jira/browse/PIG-4313
PIG-4251Pig on Storm
https://issues.apache.org/jira/browse/PIG-4251
PIG-4111Make Pig compiles with avro-1.7.7
https://issues.apache.org/jira/browse/PIG-4111
PIG-4002Disable combiner when map-side aggregation is used
https://issues.apache.org/jira/browse/PIG-4002
PIG-3952PigStorage accepts '-tagSplit' to return full split information
https://issues.apache.org/jira/browse/PIG-3952
PIG-3911Define unique fields with @OutputSchema
https://issues.apache.org/jira/browse/PIG-3911
PIG-3877Getting Geo Latitude/Longitude from Address Lines
https://issues.apache.org/jira/browse/PIG-3877
PIG-3873Geo distance calculation using Haversine
https://issues.apache.org/jira/browse/PIG-3873
PIG-3866Create ThreadLocal classloader per PigContext
https://issues.apache.org/jira/browse/PIG-3866
PIG-3864ToDate(userstring, format, timezone) computes DateTime with strange 
handling of Daylight Saving Time with location based timezones
https://issues.apache.org/jira/browse/PIG-3864
PIG-3851Upgrade jline to 2.11
https://issues.apache.org/jira/browse/PIG-3851
PIG-3668COR built-in function when atleast one of the coefficient values is 
NaN
https://issues.apache.org/jira/browse/PIG-3668
PIG-3587add functionality for rolling over dates
https://issues.apache.org/jira/browse/PIG-3587

You may edit this subscription at:
https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=16328=12322384


[jira] [Commented] (PIG-4520) Pig Eclipse project generation issues

2015-10-30 Thread Gabor Liptak (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14983606#comment-14983606
 ] 

Gabor Liptak commented on PIG-4520:
---

The instructions at 
https://cwiki.apache.org/confluence/display/PIG/How+to+set+up+Eclipse+environment
 got updated, and now I got Pig in Eclipse. Thanks [~eyal]

> Pig Eclipse project generation issues
> -
>
> Key: PIG-4520
> URL: https://issues.apache.org/jira/browse/PIG-4520
> Project: Pig
>  Issue Type: Bug
>  Components: build
>Reporter: Gabor Liptak
> Attachments: eclipse20.log, eclipse23.log
>
>
> Running commands on
> https://cwiki.apache.org/confluence/display/PIG/How+to+set+up+Eclipse+environment
> $ ant clean eclipse-files
> succeeds, but Eclipse shows 1565 errors. I uploaded eclipse20.log with the 
> errors.
> Noting "tez" related errors, I ran:
> $ ant setTezEnv eclipse-files
> succeeds, but Eclipse shows 505 errors. I uploaded eclipse23.log with the 
> errors.
> Incidentally, running both:
> $ ant clean setTezEnv eclipse-files
> $ ant clean eclipse-files compile gen
> fails with following error:
> BUILD FAILED
> /tmp/pig/build.xml:326: taskdef class prantl.ant.eclipse.EclipseTask cannot 
> be found
>  using the classloader AntClassLoader[]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (PIG-4520) Pig Eclipse project generation issues

2015-10-30 Thread Gabor Liptak (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Liptak resolved PIG-4520.
---
Resolution: Done

> Pig Eclipse project generation issues
> -
>
> Key: PIG-4520
> URL: https://issues.apache.org/jira/browse/PIG-4520
> Project: Pig
>  Issue Type: Bug
>  Components: build
>Reporter: Gabor Liptak
> Attachments: eclipse20.log, eclipse23.log
>
>
> Running commands on
> https://cwiki.apache.org/confluence/display/PIG/How+to+set+up+Eclipse+environment
> $ ant clean eclipse-files
> succeeds, but Eclipse shows 1565 errors. I uploaded eclipse20.log with the 
> errors.
> Noting "tez" related errors, I ran:
> $ ant setTezEnv eclipse-files
> succeeds, but Eclipse shows 505 errors. I uploaded eclipse23.log with the 
> errors.
> Incidentally, running both:
> $ ant clean setTezEnv eclipse-files
> $ ant clean eclipse-files compile gen
> fails with following error:
> BUILD FAILED
> /tmp/pig/build.xml:326: taskdef class prantl.ant.eclipse.EclipseTask cannot 
> be found
>  using the classloader AntClassLoader[]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4713) Document Bloom UDF

2015-10-30 Thread Gabor Liptak (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14983643#comment-14983643
 ] 

Gabor Liptak commented on PIG-4713:
---

https://pig.apache.org/docs/r0.15.0/func.html

[~rohini] Which section do you see Bloom filter goes into? Thanks

> Document Bloom UDF
> --
>
> Key: PIG-4713
> URL: https://issues.apache.org/jira/browse/PIG-4713
> Project: Pig
>  Issue Type: Task
>Reporter: Rohini Palaniswamy
>  Labels: newbie
>
> Release notes of https://issues.apache.org/jira/browse/PIG-2328 should go 
> into Builtin Functions (https://pig.apache.org/docs/r0.15.0/func.html) of 
> Apache Pig documentation.  
> Saw one user trying to use Bloom Filter to filter data on a different column 
> than the join column which should not be done as Bloom Filters give false 
> positives and can include records that actually don't match the filter 
> criteria. That should be documented as well and highlighted to avoid users 
> trying to use Bloom Filters for just regular filtering. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4621) Enable Illustrate in spark

2015-10-30 Thread Syed Zulfiqar Ali (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Zulfiqar Ali updated PIG-4621:
---
Assignee: Syed Zulfiqar Ali

> Enable Illustrate in spark
> --
>
> Key: PIG-4621
> URL: https://issues.apache.org/jira/browse/PIG-4621
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: liyunzhang_intel
>Assignee: Syed Zulfiqar Ali
> Fix For: spark-branch
>
>
> Current we don't support illustrate in spark mode.
> How illustrate works 
> see:http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html#ILLUSTRATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (PIG-4655) Support InputStats in spark mode

2015-10-30 Thread Xianda Ke (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on PIG-4655 started by Xianda Ke.
--
> Support InputStats in spark mode
> 
>
> Key: PIG-4655
> URL: https://issues.apache.org/jira/browse/PIG-4655
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Xianda Ke
>Assignee: Xianda Ke
> Fix For: spark-branch
>
> Attachments: PIG-4655-2.patch, PIG-4655-3.patch, PIG-4655-4.patch, 
> PIG-4655.patch
>
>
> Currently, InputStats is not implemented in spark mode. 
> The JUnit case TestPigRunner.testEmptyFileCounter() will fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PIG-4722) [Pig on Tez] NPE while running Combiner

2015-10-30 Thread Rohini Palaniswamy (JIRA)
Rohini Palaniswamy created PIG-4722:
---

 Summary: [Pig on Tez] NPE while running Combiner
 Key: PIG-4722
 URL: https://issues.apache.org/jira/browse/PIG-4722
 Project: Pig
  Issue Type: Bug
Reporter: Rohini Palaniswamy


DefaultSorter in Tez calls Combiner from two different threads - during spill 
in SpillThread and flush in the main thread. If both run the combiner, one ends 
up with NPE as Reporter is set on only one thread in initialization. 

{code}
java.lang.NullPointerException
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:366)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:332)
at 
org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POLocalRearrangeTez.getNextTuple(POLocalRearrangeTez.java:128)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:197)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:175)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:50)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
at 
org.apache.tez.mapreduce.combine.MRCombiner.runNewCombiner(MRCombiner.java:191)
at 
org.apache.tez.mapreduce.combine.MRCombiner.combine(MRCombiner.java:115)
at 
org.apache.tez.runtime.library.common.sort.impl.ExternalSorter.runCombineProcessor(ExternalSorter.java:279)
at 
org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.spill(DefaultSorter.java:854)
at 
org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.sortAndSpill(DefaultSorter.java:780)
at 
org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter$SpillThread.run(DefaultSorter.java:708)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4722) [Pig on Tez] NPE while running Combiner

2015-10-30 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4722:

Description: 
DefaultSorter in Tez calls Combiner from two different threads - during spill 
in SpillThread and flush in the main thread. If both run the combiner, one ends 
up with NPE as Reporter is set on only one thread in initialization. 

{code}
java.lang.NullPointerException
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:366)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:332)
at 
org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POLocalRearrangeTez.getNextTuple(POLocalRearrangeTez.java:128)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:197)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:175)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:50)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
at 
org.apache.tez.mapreduce.combine.MRCombiner.runNewCombiner(MRCombiner.java:191)
at 
org.apache.tez.mapreduce.combine.MRCombiner.combine(MRCombiner.java:115)
at 
org.apache.tez.runtime.library.common.sort.impl.ExternalSorter.runCombineProcessor(ExternalSorter.java:279)
at 
org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.spill(DefaultSorter.java:854)
at 
org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.sortAndSpill(DefaultSorter.java:780)
at 
org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter$SpillThread.run(DefaultSorter.java:708)
Caused by: java.lang.NullPointerException
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:303)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNextTuple(POProject.java:403)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:361)
... 12 more
{code}

  was:
DefaultSorter in Tez calls Combiner from two different threads - during spill 
in SpillThread and flush in the main thread. If both run the combiner, one ends 
up with NPE as Reporter is set on only one thread in initialization. 

{code}
java.lang.NullPointerException
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:366)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:332)
at 
org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POLocalRearrangeTez.getNextTuple(POLocalRearrangeTez.java:128)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:197)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:175)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:50)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
at 
org.apache.tez.mapreduce.combine.MRCombiner.runNewCombiner(MRCombiner.java:191)
at 
org.apache.tez.mapreduce.combine.MRCombiner.combine(MRCombiner.java:115)
at 
org.apache.tez.runtime.library.common.sort.impl.ExternalSorter.runCombineProcessor(ExternalSorter.java:279)
at 
org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.spill(DefaultSorter.java:854)
at 
org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.sortAndSpill(DefaultSorter.java:780)
at 
org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter$SpillThread.run(DefaultSorter.java:708)
{code}


> [Pig on Tez] NPE while running Combiner
> ---
>
> Key: PIG-4722
> URL: https://issues.apache.org/jira/browse/PIG-4722
> Project: Pig
>  Issue Type: Bug
>Reporter: Rohini Palaniswamy
>
> DefaultSorter in Tez calls Combiner from two different threads - during spill 
> in SpillThread and flush in the main thread. If both run the combiner, one 
> ends up with NPE as Reporter is set on only one thread in initialization. 
> {code}
> java.lang.NullPointerException
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:366)
> at 
> 

[jira] [Updated] (PIG-4655) Support InputStats in spark mode

2015-10-30 Thread Xianda Ke (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianda Ke updated PIG-4655:
---
Attachment: PIG-4655-4.patch

Hi [~xuefuz],
Rebased on PIG-4634,  latest PIG-4655-4.patch is attached. 

> Support InputStats in spark mode
> 
>
> Key: PIG-4655
> URL: https://issues.apache.org/jira/browse/PIG-4655
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Xianda Ke
>Assignee: Xianda Ke
> Fix For: spark-branch
>
> Attachments: PIG-4655-2.patch, PIG-4655-3.patch, PIG-4655-4.patch, 
> PIG-4655.patch
>
>
> Currently, InputStats is not implemented in spark mode. 
> The JUnit case TestPigRunner.testEmptyFileCounter() will fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (PIG-4634) Fix records count issues in output statistics

2015-10-30 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang resolved PIG-4634.
--
Resolution: Fixed

Committed to Spark branch. Thanks, Xianda.

> Fix records count issues in output statistics
> -
>
> Key: PIG-4634
> URL: https://issues.apache.org/jira/browse/PIG-4634
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Xianda Ke
>Assignee: Xianda Ke
> Fix For: spark-branch
>
> Attachments: PIG-4634-3.patch, PIG-4634-4.patch, PIG-4634-5.patch, 
> PIG-4634-6.patch, PIG-4634.patch, PIG-4634_2.patch
>
>
> Test cases simpleTest() and simpleTest2()  in TestPigRunner failed, caused by 
> following issues:
> 1. pig context in SparkPigStats isn't initialized.
> 2. the records count logic hasn't been implemented.
> 3. getOutpugAlias(), getPigProperties(), getBytesWritten() and 
> getRecordWritten() have not been implemented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (PIG-4720) Spark related JARs are not included when importing project via IDE

2015-10-30 Thread Xianda Ke (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on PIG-4720 started by Xianda Ke.
--
>  Spark related JARs are not included when importing project via IDE
> ---
>
> Key: PIG-4720
> URL: https://issues.apache.org/jira/browse/PIG-4720
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Xianda Ke
>Assignee: Xianda Ke
> Fix For: spark-branch
>
> Attachments: PIG-4720.patch
>
>
> It is a minior issue. Spark related JARs are not included when importing 
> project via IDE.
> {code}
> $ ant -Dhadoopversion=23 eclipse-files 
> {code}
> Open the generated .classpath, the spark related JARs are not in the 
> classpathentry list.  Because the spark JARs were moved to a new  
> directory(PIG-4667), but eclipse-files target in build.xml are not changed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4713) Document Bloom UDF

2015-10-30 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14983689#comment-14983689
 ] 

Daniel Dai commented on PIG-4713:
-

It should be in Eval Functions section.

> Document Bloom UDF
> --
>
> Key: PIG-4713
> URL: https://issues.apache.org/jira/browse/PIG-4713
> Project: Pig
>  Issue Type: Task
>Reporter: Rohini Palaniswamy
>  Labels: newbie
>
> Release notes of https://issues.apache.org/jira/browse/PIG-2328 should go 
> into Builtin Functions (https://pig.apache.org/docs/r0.15.0/func.html) of 
> Apache Pig documentation.  
> Saw one user trying to use Bloom Filter to filter data on a different column 
> than the join column which should not be done as Bloom Filters give false 
> positives and can include records that actually don't match the filter 
> criteria. That should be documented as well and highlighted to avoid users 
> trying to use Bloom Filters for just regular filtering. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4713) Document Bloom UDF

2015-10-30 Thread Gabor Liptak (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Liptak updated PIG-4713:
--
Attachment: PIG-4713.1.patch

> Document Bloom UDF
> --
>
> Key: PIG-4713
> URL: https://issues.apache.org/jira/browse/PIG-4713
> Project: Pig
>  Issue Type: Task
>Reporter: Rohini Palaniswamy
>  Labels: newbie
> Attachments: PIG-4713.1.patch
>
>
> Release notes of https://issues.apache.org/jira/browse/PIG-2328 should go 
> into Builtin Functions (https://pig.apache.org/docs/r0.15.0/func.html) of 
> Apache Pig documentation.  
> Saw one user trying to use Bloom Filter to filter data on a different column 
> than the join column which should not be done as Bloom Filters give false 
> positives and can include records that actually don't match the filter 
> criteria. That should be documented as well and highlighted to avoid users 
> trying to use Bloom Filters for just regular filtering. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4713) Document Bloom UDF

2015-10-30 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-4713:

   Resolution: Fixed
 Assignee: Gabor Liptak
 Hadoop Flags: Reviewed
Fix Version/s: 0.16.0
   Status: Resolved  (was: Patch Available)

+1. Patch committed to trunk. Thanks Gabor!

> Document Bloom UDF
> --
>
> Key: PIG-4713
> URL: https://issues.apache.org/jira/browse/PIG-4713
> Project: Pig
>  Issue Type: Task
>Reporter: Rohini Palaniswamy
>Assignee: Gabor Liptak
>  Labels: newbie
> Fix For: 0.16.0
>
> Attachments: PIG-4713.1.patch
>
>
> Release notes of https://issues.apache.org/jira/browse/PIG-2328 should go 
> into Builtin Functions (https://pig.apache.org/docs/r0.15.0/func.html) of 
> Apache Pig documentation.  
> Saw one user trying to use Bloom Filter to filter data on a different column 
> than the join column which should not be done as Bloom Filters give false 
> positives and can include records that actually don't match the filter 
> criteria. That should be documented as well and highlighted to avoid users 
> trying to use Bloom Filters for just regular filtering. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PIG-4721) IsEmpty documentation error

2015-10-30 Thread Nathan Smith (JIRA)
Nathan Smith created PIG-4721:
-

 Summary: IsEmpty documentation error
 Key: PIG-4721
 URL: https://issues.apache.org/jira/browse/PIG-4721
 Project: Pig
  Issue Type: Bug
  Components: documentation
Affects Versions: 0.15.0
Reporter: Nathan Smith


http://pig.apache.org/docs/r0.15.0/func.html#isempty

The documentation example uses a left outer join, but this produces a flat 
tuple, which is invalid for IsEmpty. I believe the example in the docs should 
be:

{code}
SSN = load 'ssn.txt' using PigStorage() as (ssn:long);

SSN_NAME = load 'students.txt' using PigStorage() as (ssn:long, name:chararray);

/* do a cogroup of SSN with SSN_Name */
X = COGROUP SSN by ssn, SSN_NAME by ssn;

/* only keep those ssn's for which there is no name */
Y = filter X by IsEmpty(SSN_NAME);
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4721) IsEmpty documentation error

2015-10-30 Thread Nathan Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Smith updated PIG-4721:
--
Priority: Trivial  (was: Major)

> IsEmpty documentation error
> ---
>
> Key: PIG-4721
> URL: https://issues.apache.org/jira/browse/PIG-4721
> Project: Pig
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 0.15.0
>Reporter: Nathan Smith
>Priority: Trivial
>
> http://pig.apache.org/docs/r0.15.0/func.html#isempty
> The documentation example uses a left outer join, but this produces a flat 
> tuple, which is invalid for IsEmpty. I believe the example in the docs should 
> be:
> {code}
> SSN = load 'ssn.txt' using PigStorage() as (ssn:long);
> SSN_NAME = load 'students.txt' using PigStorage() as (ssn:long, 
> name:chararray);
> /* do a cogroup of SSN with SSN_Name */
> X = COGROUP SSN by ssn, SSN_NAME by ssn;
> /* only keep those ssn's for which there is no name */
> Y = filter X by IsEmpty(SSN_NAME);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4634) Fix records count issues in output statistics

2015-10-30 Thread Mohit Sabharwal (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982908#comment-14982908
 ] 

Mohit Sabharwal commented on PIG-4634:
--

Thanks, [~kexianda]! 

+1 (non-binding)

> Fix records count issues in output statistics
> -
>
> Key: PIG-4634
> URL: https://issues.apache.org/jira/browse/PIG-4634
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Xianda Ke
>Assignee: Xianda Ke
> Fix For: spark-branch
>
> Attachments: PIG-4634-3.patch, PIG-4634-4.patch, PIG-4634-5.patch, 
> PIG-4634-6.patch, PIG-4634.patch, PIG-4634_2.patch
>
>
> Test cases simpleTest() and simpleTest2()  in TestPigRunner failed, caused by 
> following issues:
> 1. pig context in SparkPigStats isn't initialized.
> 2. the records count logic hasn't been implemented.
> 3. getOutpugAlias(), getPigProperties(), getBytesWritten() and 
> getRecordWritten() have not been implemented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4417) Pig's register command should support automatic fetching of jars from repo.

2015-10-30 Thread Akshay Rai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982965#comment-14982965
 ] 

Akshay Rai commented on PIG-4417:
-

Thanks [~daijy]. I will the update the patch soon. I might need some help with 
the tests.

> Pig's register command should support automatic fetching of jars from repo.
> ---
>
> Key: PIG-4417
> URL: https://issues.apache.org/jira/browse/PIG-4417
> Project: Pig
>  Issue Type: Improvement
>Reporter: Akshay Rai
>Assignee: Akshay Rai
> Attachments: PIG-4417.1.patch, PIG-4417.2.patch
>
>
> Currently Pig's register command takes a local path to a dependency jar . 
> This clutters the local file-system as users may forget to remove this jar 
> later.
> It would be nice if Pig supported a Gradle like notation to download the jar 
> from a repository.
> Ex: At the top of the Pig script a user could add
> register '::'; 
> It should be backward compatible and should support a local file path if so 
> desired.
> RB: https://reviews.apache.org/r/31662/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4721) IsEmpty documentation error

2015-10-30 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-4721:

Attachment: PIG-4721.patch

> IsEmpty documentation error
> ---
>
> Key: PIG-4721
> URL: https://issues.apache.org/jira/browse/PIG-4721
> Project: Pig
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 0.15.0
>Reporter: Nathan Smith
>Priority: Trivial
> Attachments: PIG-4721.patch
>
>
> http://pig.apache.org/docs/r0.15.0/func.html#isempty
> The documentation example uses a left outer join, but this produces a flat 
> tuple, which is invalid for IsEmpty. I believe the example in the docs should 
> be:
> {code}
> SSN = load 'ssn.txt' using PigStorage() as (ssn:long);
> SSN_NAME = load 'students.txt' using PigStorage() as (ssn:long, 
> name:chararray);
> /* do a cogroup of SSN with SSN_Name */
> X = COGROUP SSN by ssn, SSN_NAME by ssn;
> /* only keep those ssn's for which there is no name */
> Y = filter X by IsEmpty(SSN_NAME);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (PIG-4721) IsEmpty documentation error

2015-10-30 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved PIG-4721.
-
   Resolution: Fixed
 Assignee: Nathan Smith
Fix Version/s: 0.16.0

Fixed. Thanks Nathan!

> IsEmpty documentation error
> ---
>
> Key: PIG-4721
> URL: https://issues.apache.org/jira/browse/PIG-4721
> Project: Pig
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 0.15.0
>Reporter: Nathan Smith
>Assignee: Nathan Smith
>Priority: Trivial
> Fix For: 0.16.0
>
> Attachments: PIG-4721.patch
>
>
> http://pig.apache.org/docs/r0.15.0/func.html#isempty
> The documentation example uses a left outer join, but this produces a flat 
> tuple, which is invalid for IsEmpty. I believe the example in the docs should 
> be:
> {code}
> SSN = load 'ssn.txt' using PigStorage() as (ssn:long);
> SSN_NAME = load 'students.txt' using PigStorage() as (ssn:long, 
> name:chararray);
> /* do a cogroup of SSN with SSN_Name */
> X = COGROUP SSN by ssn, SSN_NAME by ssn;
> /* only keep those ssn's for which there is no name */
> Y = filter X by IsEmpty(SSN_NAME);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)