[jira] [Commented] (PIG-4634) Fix records count issues in output statistics
[ https://issues.apache.org/jira/browse/PIG-4634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982091#comment-14982091 ] Xianda Ke commented on PIG-4634: Hi [~mohitsabharwal], Thank you for your comments. the code readability nits are fixed(https://reviews.apache.org/r/37627/diff/5-6/). Thanks a lot! > Fix records count issues in output statistics > - > > Key: PIG-4634 > URL: https://issues.apache.org/jira/browse/PIG-4634 > Project: Pig > Issue Type: Sub-task > Components: spark >Reporter: Xianda Ke >Assignee: Xianda Ke > Fix For: spark-branch > > Attachments: PIG-4634-3.patch, PIG-4634-4.patch, PIG-4634-5.patch, > PIG-4634-6.patch, PIG-4634.patch, PIG-4634_2.patch > > > Test cases simpleTest() and simpleTest2() in TestPigRunner failed, caused by > following issues: > 1. pig context in SparkPigStats isn't initialized. > 2. the records count logic hasn't been implemented. > 3. getOutpugAlias(), getPigProperties(), getBytesWritten() and > getRecordWritten() have not been implemented. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4634) Fix records count issues in output statistics
[ https://issues.apache.org/jira/browse/PIG-4634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xianda Ke updated PIG-4634: --- Attachment: PIG-4634-6.patch > Fix records count issues in output statistics > - > > Key: PIG-4634 > URL: https://issues.apache.org/jira/browse/PIG-4634 > Project: Pig > Issue Type: Sub-task > Components: spark >Reporter: Xianda Ke >Assignee: Xianda Ke > Fix For: spark-branch > > Attachments: PIG-4634-3.patch, PIG-4634-4.patch, PIG-4634-5.patch, > PIG-4634-6.patch, PIG-4634.patch, PIG-4634_2.patch > > > Test cases simpleTest() and simpleTest2() in TestPigRunner failed, caused by > following issues: > 1. pig context in SparkPigStats isn't initialized. > 2. the records count logic hasn't been implemented. > 3. getOutpugAlias(), getPigProperties(), getBytesWritten() and > getRecordWritten() have not been implemented. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4634) Fix records count issues in output statistics
[ https://issues.apache.org/jira/browse/PIG-4634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982093#comment-14982093 ] Xianda Ke commented on PIG-4634: the latest patch PIG-4634-6.patch is attached. > Fix records count issues in output statistics > - > > Key: PIG-4634 > URL: https://issues.apache.org/jira/browse/PIG-4634 > Project: Pig > Issue Type: Sub-task > Components: spark >Reporter: Xianda Ke >Assignee: Xianda Ke > Fix For: spark-branch > > Attachments: PIG-4634-3.patch, PIG-4634-4.patch, PIG-4634-5.patch, > PIG-4634-6.patch, PIG-4634.patch, PIG-4634_2.patch > > > Test cases simpleTest() and simpleTest2() in TestPigRunner failed, caused by > following issues: > 1. pig context in SparkPigStats isn't initialized. > 2. the records count logic hasn't been implemented. > 3. getOutpugAlias(), getPigProperties(), getBytesWritten() and > getRecordWritten() have not been implemented. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] Subscription: PIG patch available
Issue Subscription Filter: PIG patch available (27 issues) Subscriber: pigdaily Key Summary PIG-4684Exception should be changed to warning when job diagnostics cannot be fetched https://issues.apache.org/jira/browse/PIG-4684 PIG-4677Display failure information on stop on failure https://issues.apache.org/jira/browse/PIG-4677 PIG-4675Multi Store Statement will fail on the second store statement. https://issues.apache.org/jira/browse/PIG-4675 PIG-4656Improve String serialization and comparator performance in BinInterSedes https://issues.apache.org/jira/browse/PIG-4656 PIG-4641Print the instance of Object without using toString() https://issues.apache.org/jira/browse/PIG-4641 PIG-4598Allow user defined plan optimizer rules https://issues.apache.org/jira/browse/PIG-4598 PIG-4581thread safe issue in NodeIdGenerator https://issues.apache.org/jira/browse/PIG-4581 PIG-4539New PigUnit https://issues.apache.org/jira/browse/PIG-4539 PIG-4515org.apache.pig.builtin.Distinct throws ClassCastException https://issues.apache.org/jira/browse/PIG-4515 PIG-4455Should use DependencyOrderWalker instead of DepthFirstWalker in MRPrinter https://issues.apache.org/jira/browse/PIG-4455 PIG-4417Pig's register command should support automatic fetching of jars from repo. https://issues.apache.org/jira/browse/PIG-4417 PIG-4373Implement PIG-3861 in Tez https://issues.apache.org/jira/browse/PIG-4373 PIG-4341Add CMX support to pig.tmpfilecompression.codec https://issues.apache.org/jira/browse/PIG-4341 PIG-4323PackageConverter hanging in Spark https://issues.apache.org/jira/browse/PIG-4323 PIG-4313StackOverflowError in LIMIT operation on Spark https://issues.apache.org/jira/browse/PIG-4313 PIG-4251Pig on Storm https://issues.apache.org/jira/browse/PIG-4251 PIG-4111Make Pig compiles with avro-1.7.7 https://issues.apache.org/jira/browse/PIG-4111 PIG-4002Disable combiner when map-side aggregation is used https://issues.apache.org/jira/browse/PIG-4002 PIG-3952PigStorage accepts '-tagSplit' to return full split information https://issues.apache.org/jira/browse/PIG-3952 PIG-3911Define unique fields with @OutputSchema https://issues.apache.org/jira/browse/PIG-3911 PIG-3877Getting Geo Latitude/Longitude from Address Lines https://issues.apache.org/jira/browse/PIG-3877 PIG-3873Geo distance calculation using Haversine https://issues.apache.org/jira/browse/PIG-3873 PIG-3866Create ThreadLocal classloader per PigContext https://issues.apache.org/jira/browse/PIG-3866 PIG-3864ToDate(userstring, format, timezone) computes DateTime with strange handling of Daylight Saving Time with location based timezones https://issues.apache.org/jira/browse/PIG-3864 PIG-3851Upgrade jline to 2.11 https://issues.apache.org/jira/browse/PIG-3851 PIG-3668COR built-in function when atleast one of the coefficient values is NaN https://issues.apache.org/jira/browse/PIG-3668 PIG-3587add functionality for rolling over dates https://issues.apache.org/jira/browse/PIG-3587 You may edit this subscription at: https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=16328=12322384
[jira] [Commented] (PIG-4520) Pig Eclipse project generation issues
[ https://issues.apache.org/jira/browse/PIG-4520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14983606#comment-14983606 ] Gabor Liptak commented on PIG-4520: --- The instructions at https://cwiki.apache.org/confluence/display/PIG/How+to+set+up+Eclipse+environment got updated, and now I got Pig in Eclipse. Thanks [~eyal] > Pig Eclipse project generation issues > - > > Key: PIG-4520 > URL: https://issues.apache.org/jira/browse/PIG-4520 > Project: Pig > Issue Type: Bug > Components: build >Reporter: Gabor Liptak > Attachments: eclipse20.log, eclipse23.log > > > Running commands on > https://cwiki.apache.org/confluence/display/PIG/How+to+set+up+Eclipse+environment > $ ant clean eclipse-files > succeeds, but Eclipse shows 1565 errors. I uploaded eclipse20.log with the > errors. > Noting "tez" related errors, I ran: > $ ant setTezEnv eclipse-files > succeeds, but Eclipse shows 505 errors. I uploaded eclipse23.log with the > errors. > Incidentally, running both: > $ ant clean setTezEnv eclipse-files > $ ant clean eclipse-files compile gen > fails with following error: > BUILD FAILED > /tmp/pig/build.xml:326: taskdef class prantl.ant.eclipse.EclipseTask cannot > be found > using the classloader AntClassLoader[] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (PIG-4520) Pig Eclipse project generation issues
[ https://issues.apache.org/jira/browse/PIG-4520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Liptak resolved PIG-4520. --- Resolution: Done > Pig Eclipse project generation issues > - > > Key: PIG-4520 > URL: https://issues.apache.org/jira/browse/PIG-4520 > Project: Pig > Issue Type: Bug > Components: build >Reporter: Gabor Liptak > Attachments: eclipse20.log, eclipse23.log > > > Running commands on > https://cwiki.apache.org/confluence/display/PIG/How+to+set+up+Eclipse+environment > $ ant clean eclipse-files > succeeds, but Eclipse shows 1565 errors. I uploaded eclipse20.log with the > errors. > Noting "tez" related errors, I ran: > $ ant setTezEnv eclipse-files > succeeds, but Eclipse shows 505 errors. I uploaded eclipse23.log with the > errors. > Incidentally, running both: > $ ant clean setTezEnv eclipse-files > $ ant clean eclipse-files compile gen > fails with following error: > BUILD FAILED > /tmp/pig/build.xml:326: taskdef class prantl.ant.eclipse.EclipseTask cannot > be found > using the classloader AntClassLoader[] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4713) Document Bloom UDF
[ https://issues.apache.org/jira/browse/PIG-4713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14983643#comment-14983643 ] Gabor Liptak commented on PIG-4713: --- https://pig.apache.org/docs/r0.15.0/func.html [~rohini] Which section do you see Bloom filter goes into? Thanks > Document Bloom UDF > -- > > Key: PIG-4713 > URL: https://issues.apache.org/jira/browse/PIG-4713 > Project: Pig > Issue Type: Task >Reporter: Rohini Palaniswamy > Labels: newbie > > Release notes of https://issues.apache.org/jira/browse/PIG-2328 should go > into Builtin Functions (https://pig.apache.org/docs/r0.15.0/func.html) of > Apache Pig documentation. > Saw one user trying to use Bloom Filter to filter data on a different column > than the join column which should not be done as Bloom Filters give false > positives and can include records that actually don't match the filter > criteria. That should be documented as well and highlighted to avoid users > trying to use Bloom Filters for just regular filtering. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4621) Enable Illustrate in spark
[ https://issues.apache.org/jira/browse/PIG-4621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Syed Zulfiqar Ali updated PIG-4621: --- Assignee: Syed Zulfiqar Ali > Enable Illustrate in spark > -- > > Key: PIG-4621 > URL: https://issues.apache.org/jira/browse/PIG-4621 > Project: Pig > Issue Type: Sub-task > Components: spark >Reporter: liyunzhang_intel >Assignee: Syed Zulfiqar Ali > Fix For: spark-branch > > > Current we don't support illustrate in spark mode. > How illustrate works > see:http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html#ILLUSTRATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (PIG-4655) Support InputStats in spark mode
[ https://issues.apache.org/jira/browse/PIG-4655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on PIG-4655 started by Xianda Ke. -- > Support InputStats in spark mode > > > Key: PIG-4655 > URL: https://issues.apache.org/jira/browse/PIG-4655 > Project: Pig > Issue Type: Sub-task > Components: spark >Reporter: Xianda Ke >Assignee: Xianda Ke > Fix For: spark-branch > > Attachments: PIG-4655-2.patch, PIG-4655-3.patch, PIG-4655-4.patch, > PIG-4655.patch > > > Currently, InputStats is not implemented in spark mode. > The JUnit case TestPigRunner.testEmptyFileCounter() will fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PIG-4722) [Pig on Tez] NPE while running Combiner
Rohini Palaniswamy created PIG-4722: --- Summary: [Pig on Tez] NPE while running Combiner Key: PIG-4722 URL: https://issues.apache.org/jira/browse/PIG-4722 Project: Pig Issue Type: Bug Reporter: Rohini Palaniswamy DefaultSorter in Tez calls Combiner from two different threads - during spill in SpillThread and flush in the main thread. If both run the combiner, one ends up with NPE as Reporter is set on only one thread in initialization. {code} java.lang.NullPointerException at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:366) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:332) at org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POLocalRearrangeTez.getNextTuple(POLocalRearrangeTez.java:128) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:197) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:175) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:50) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) at org.apache.tez.mapreduce.combine.MRCombiner.runNewCombiner(MRCombiner.java:191) at org.apache.tez.mapreduce.combine.MRCombiner.combine(MRCombiner.java:115) at org.apache.tez.runtime.library.common.sort.impl.ExternalSorter.runCombineProcessor(ExternalSorter.java:279) at org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.spill(DefaultSorter.java:854) at org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.sortAndSpill(DefaultSorter.java:780) at org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter$SpillThread.run(DefaultSorter.java:708) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4722) [Pig on Tez] NPE while running Combiner
[ https://issues.apache.org/jira/browse/PIG-4722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohini Palaniswamy updated PIG-4722: Description: DefaultSorter in Tez calls Combiner from two different threads - during spill in SpillThread and flush in the main thread. If both run the combiner, one ends up with NPE as Reporter is set on only one thread in initialization. {code} java.lang.NullPointerException at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:366) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:332) at org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POLocalRearrangeTez.getNextTuple(POLocalRearrangeTez.java:128) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:197) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:175) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:50) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) at org.apache.tez.mapreduce.combine.MRCombiner.runNewCombiner(MRCombiner.java:191) at org.apache.tez.mapreduce.combine.MRCombiner.combine(MRCombiner.java:115) at org.apache.tez.runtime.library.common.sort.impl.ExternalSorter.runCombineProcessor(ExternalSorter.java:279) at org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.spill(DefaultSorter.java:854) at org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.sortAndSpill(DefaultSorter.java:780) at org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter$SpillThread.run(DefaultSorter.java:708) Caused by: java.lang.NullPointerException at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:303) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNextTuple(POProject.java:403) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:361) ... 12 more {code} was: DefaultSorter in Tez calls Combiner from two different threads - during spill in SpillThread and flush in the main thread. If both run the combiner, one ends up with NPE as Reporter is set on only one thread in initialization. {code} java.lang.NullPointerException at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:366) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:332) at org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POLocalRearrangeTez.getNextTuple(POLocalRearrangeTez.java:128) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:197) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:175) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:50) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) at org.apache.tez.mapreduce.combine.MRCombiner.runNewCombiner(MRCombiner.java:191) at org.apache.tez.mapreduce.combine.MRCombiner.combine(MRCombiner.java:115) at org.apache.tez.runtime.library.common.sort.impl.ExternalSorter.runCombineProcessor(ExternalSorter.java:279) at org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.spill(DefaultSorter.java:854) at org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.sortAndSpill(DefaultSorter.java:780) at org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter$SpillThread.run(DefaultSorter.java:708) {code} > [Pig on Tez] NPE while running Combiner > --- > > Key: PIG-4722 > URL: https://issues.apache.org/jira/browse/PIG-4722 > Project: Pig > Issue Type: Bug >Reporter: Rohini Palaniswamy > > DefaultSorter in Tez calls Combiner from two different threads - during spill > in SpillThread and flush in the main thread. If both run the combiner, one > ends up with NPE as Reporter is set on only one thread in initialization. > {code} > java.lang.NullPointerException > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:366) > at >
[jira] [Updated] (PIG-4655) Support InputStats in spark mode
[ https://issues.apache.org/jira/browse/PIG-4655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xianda Ke updated PIG-4655: --- Attachment: PIG-4655-4.patch Hi [~xuefuz], Rebased on PIG-4634, latest PIG-4655-4.patch is attached. > Support InputStats in spark mode > > > Key: PIG-4655 > URL: https://issues.apache.org/jira/browse/PIG-4655 > Project: Pig > Issue Type: Sub-task > Components: spark >Reporter: Xianda Ke >Assignee: Xianda Ke > Fix For: spark-branch > > Attachments: PIG-4655-2.patch, PIG-4655-3.patch, PIG-4655-4.patch, > PIG-4655.patch > > > Currently, InputStats is not implemented in spark mode. > The JUnit case TestPigRunner.testEmptyFileCounter() will fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (PIG-4634) Fix records count issues in output statistics
[ https://issues.apache.org/jira/browse/PIG-4634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang resolved PIG-4634. -- Resolution: Fixed Committed to Spark branch. Thanks, Xianda. > Fix records count issues in output statistics > - > > Key: PIG-4634 > URL: https://issues.apache.org/jira/browse/PIG-4634 > Project: Pig > Issue Type: Sub-task > Components: spark >Reporter: Xianda Ke >Assignee: Xianda Ke > Fix For: spark-branch > > Attachments: PIG-4634-3.patch, PIG-4634-4.patch, PIG-4634-5.patch, > PIG-4634-6.patch, PIG-4634.patch, PIG-4634_2.patch > > > Test cases simpleTest() and simpleTest2() in TestPigRunner failed, caused by > following issues: > 1. pig context in SparkPigStats isn't initialized. > 2. the records count logic hasn't been implemented. > 3. getOutpugAlias(), getPigProperties(), getBytesWritten() and > getRecordWritten() have not been implemented. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (PIG-4720) Spark related JARs are not included when importing project via IDE
[ https://issues.apache.org/jira/browse/PIG-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on PIG-4720 started by Xianda Ke. -- > Spark related JARs are not included when importing project via IDE > --- > > Key: PIG-4720 > URL: https://issues.apache.org/jira/browse/PIG-4720 > Project: Pig > Issue Type: Sub-task > Components: spark >Reporter: Xianda Ke >Assignee: Xianda Ke > Fix For: spark-branch > > Attachments: PIG-4720.patch > > > It is a minior issue. Spark related JARs are not included when importing > project via IDE. > {code} > $ ant -Dhadoopversion=23 eclipse-files > {code} > Open the generated .classpath, the spark related JARs are not in the > classpathentry list. Because the spark JARs were moved to a new > directory(PIG-4667), but eclipse-files target in build.xml are not changed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4713) Document Bloom UDF
[ https://issues.apache.org/jira/browse/PIG-4713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14983689#comment-14983689 ] Daniel Dai commented on PIG-4713: - It should be in Eval Functions section. > Document Bloom UDF > -- > > Key: PIG-4713 > URL: https://issues.apache.org/jira/browse/PIG-4713 > Project: Pig > Issue Type: Task >Reporter: Rohini Palaniswamy > Labels: newbie > > Release notes of https://issues.apache.org/jira/browse/PIG-2328 should go > into Builtin Functions (https://pig.apache.org/docs/r0.15.0/func.html) of > Apache Pig documentation. > Saw one user trying to use Bloom Filter to filter data on a different column > than the join column which should not be done as Bloom Filters give false > positives and can include records that actually don't match the filter > criteria. That should be documented as well and highlighted to avoid users > trying to use Bloom Filters for just regular filtering. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4713) Document Bloom UDF
[ https://issues.apache.org/jira/browse/PIG-4713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Liptak updated PIG-4713: -- Attachment: PIG-4713.1.patch > Document Bloom UDF > -- > > Key: PIG-4713 > URL: https://issues.apache.org/jira/browse/PIG-4713 > Project: Pig > Issue Type: Task >Reporter: Rohini Palaniswamy > Labels: newbie > Attachments: PIG-4713.1.patch > > > Release notes of https://issues.apache.org/jira/browse/PIG-2328 should go > into Builtin Functions (https://pig.apache.org/docs/r0.15.0/func.html) of > Apache Pig documentation. > Saw one user trying to use Bloom Filter to filter data on a different column > than the join column which should not be done as Bloom Filters give false > positives and can include records that actually don't match the filter > criteria. That should be documented as well and highlighted to avoid users > trying to use Bloom Filters for just regular filtering. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4713) Document Bloom UDF
[ https://issues.apache.org/jira/browse/PIG-4713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-4713: Resolution: Fixed Assignee: Gabor Liptak Hadoop Flags: Reviewed Fix Version/s: 0.16.0 Status: Resolved (was: Patch Available) +1. Patch committed to trunk. Thanks Gabor! > Document Bloom UDF > -- > > Key: PIG-4713 > URL: https://issues.apache.org/jira/browse/PIG-4713 > Project: Pig > Issue Type: Task >Reporter: Rohini Palaniswamy >Assignee: Gabor Liptak > Labels: newbie > Fix For: 0.16.0 > > Attachments: PIG-4713.1.patch > > > Release notes of https://issues.apache.org/jira/browse/PIG-2328 should go > into Builtin Functions (https://pig.apache.org/docs/r0.15.0/func.html) of > Apache Pig documentation. > Saw one user trying to use Bloom Filter to filter data on a different column > than the join column which should not be done as Bloom Filters give false > positives and can include records that actually don't match the filter > criteria. That should be documented as well and highlighted to avoid users > trying to use Bloom Filters for just regular filtering. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PIG-4721) IsEmpty documentation error
Nathan Smith created PIG-4721: - Summary: IsEmpty documentation error Key: PIG-4721 URL: https://issues.apache.org/jira/browse/PIG-4721 Project: Pig Issue Type: Bug Components: documentation Affects Versions: 0.15.0 Reporter: Nathan Smith http://pig.apache.org/docs/r0.15.0/func.html#isempty The documentation example uses a left outer join, but this produces a flat tuple, which is invalid for IsEmpty. I believe the example in the docs should be: {code} SSN = load 'ssn.txt' using PigStorage() as (ssn:long); SSN_NAME = load 'students.txt' using PigStorage() as (ssn:long, name:chararray); /* do a cogroup of SSN with SSN_Name */ X = COGROUP SSN by ssn, SSN_NAME by ssn; /* only keep those ssn's for which there is no name */ Y = filter X by IsEmpty(SSN_NAME); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4721) IsEmpty documentation error
[ https://issues.apache.org/jira/browse/PIG-4721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nathan Smith updated PIG-4721: -- Priority: Trivial (was: Major) > IsEmpty documentation error > --- > > Key: PIG-4721 > URL: https://issues.apache.org/jira/browse/PIG-4721 > Project: Pig > Issue Type: Bug > Components: documentation >Affects Versions: 0.15.0 >Reporter: Nathan Smith >Priority: Trivial > > http://pig.apache.org/docs/r0.15.0/func.html#isempty > The documentation example uses a left outer join, but this produces a flat > tuple, which is invalid for IsEmpty. I believe the example in the docs should > be: > {code} > SSN = load 'ssn.txt' using PigStorage() as (ssn:long); > SSN_NAME = load 'students.txt' using PigStorage() as (ssn:long, > name:chararray); > /* do a cogroup of SSN with SSN_Name */ > X = COGROUP SSN by ssn, SSN_NAME by ssn; > /* only keep those ssn's for which there is no name */ > Y = filter X by IsEmpty(SSN_NAME); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4634) Fix records count issues in output statistics
[ https://issues.apache.org/jira/browse/PIG-4634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982908#comment-14982908 ] Mohit Sabharwal commented on PIG-4634: -- Thanks, [~kexianda]! +1 (non-binding) > Fix records count issues in output statistics > - > > Key: PIG-4634 > URL: https://issues.apache.org/jira/browse/PIG-4634 > Project: Pig > Issue Type: Sub-task > Components: spark >Reporter: Xianda Ke >Assignee: Xianda Ke > Fix For: spark-branch > > Attachments: PIG-4634-3.patch, PIG-4634-4.patch, PIG-4634-5.patch, > PIG-4634-6.patch, PIG-4634.patch, PIG-4634_2.patch > > > Test cases simpleTest() and simpleTest2() in TestPigRunner failed, caused by > following issues: > 1. pig context in SparkPigStats isn't initialized. > 2. the records count logic hasn't been implemented. > 3. getOutpugAlias(), getPigProperties(), getBytesWritten() and > getRecordWritten() have not been implemented. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4417) Pig's register command should support automatic fetching of jars from repo.
[ https://issues.apache.org/jira/browse/PIG-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982965#comment-14982965 ] Akshay Rai commented on PIG-4417: - Thanks [~daijy]. I will the update the patch soon. I might need some help with the tests. > Pig's register command should support automatic fetching of jars from repo. > --- > > Key: PIG-4417 > URL: https://issues.apache.org/jira/browse/PIG-4417 > Project: Pig > Issue Type: Improvement >Reporter: Akshay Rai >Assignee: Akshay Rai > Attachments: PIG-4417.1.patch, PIG-4417.2.patch > > > Currently Pig's register command takes a local path to a dependency jar . > This clutters the local file-system as users may forget to remove this jar > later. > It would be nice if Pig supported a Gradle like notation to download the jar > from a repository. > Ex: At the top of the Pig script a user could add > register '::'; > It should be backward compatible and should support a local file path if so > desired. > RB: https://reviews.apache.org/r/31662/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4721) IsEmpty documentation error
[ https://issues.apache.org/jira/browse/PIG-4721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-4721: Attachment: PIG-4721.patch > IsEmpty documentation error > --- > > Key: PIG-4721 > URL: https://issues.apache.org/jira/browse/PIG-4721 > Project: Pig > Issue Type: Bug > Components: documentation >Affects Versions: 0.15.0 >Reporter: Nathan Smith >Priority: Trivial > Attachments: PIG-4721.patch > > > http://pig.apache.org/docs/r0.15.0/func.html#isempty > The documentation example uses a left outer join, but this produces a flat > tuple, which is invalid for IsEmpty. I believe the example in the docs should > be: > {code} > SSN = load 'ssn.txt' using PigStorage() as (ssn:long); > SSN_NAME = load 'students.txt' using PigStorage() as (ssn:long, > name:chararray); > /* do a cogroup of SSN with SSN_Name */ > X = COGROUP SSN by ssn, SSN_NAME by ssn; > /* only keep those ssn's for which there is no name */ > Y = filter X by IsEmpty(SSN_NAME); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (PIG-4721) IsEmpty documentation error
[ https://issues.apache.org/jira/browse/PIG-4721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai resolved PIG-4721. - Resolution: Fixed Assignee: Nathan Smith Fix Version/s: 0.16.0 Fixed. Thanks Nathan! > IsEmpty documentation error > --- > > Key: PIG-4721 > URL: https://issues.apache.org/jira/browse/PIG-4721 > Project: Pig > Issue Type: Bug > Components: documentation >Affects Versions: 0.15.0 >Reporter: Nathan Smith >Assignee: Nathan Smith >Priority: Trivial > Fix For: 0.16.0 > > Attachments: PIG-4721.patch > > > http://pig.apache.org/docs/r0.15.0/func.html#isempty > The documentation example uses a left outer join, but this produces a flat > tuple, which is invalid for IsEmpty. I believe the example in the docs should > be: > {code} > SSN = load 'ssn.txt' using PigStorage() as (ssn:long); > SSN_NAME = load 'students.txt' using PigStorage() as (ssn:long, > name:chararray); > /* do a cogroup of SSN with SSN_Name */ > X = COGROUP SSN by ssn, SSN_NAME by ssn; > /* only keep those ssn's for which there is no name */ > Y = filter X by IsEmpty(SSN_NAME); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)