[jira] [Commented] (PIG-5112) Cleanup pig-template.xml
[ https://issues.apache.org/jira/browse/PIG-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15838166#comment-15838166 ] Thejas M Nair commented on PIG-5112: +1 > Cleanup pig-template.xml > > > Key: PIG-5112 > URL: https://issues.apache.org/jira/browse/PIG-5112 > Project: Pig > Issue Type: Bug > Components: build >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.17.0 > > Attachments: PIG-5112-1.patch > > > Several entries in pig-template.xml are outdated. Attach a patch to remove or > update those entries. Later we shall use ivy:makepom to generate pig.pom and > lib dir, I will open a separate ticket for that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4972) StreamingIO_1 fail on perl 5.22
[ https://issues.apache.org/jira/browse/PIG-4972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15450030#comment-15450030 ] Thejas M Nair commented on PIG-4972: +1 > StreamingIO_1 fail on perl 5.22 > --- > > Key: PIG-4972 > URL: https://issues.apache.org/jira/browse/PIG-4972 > Project: Pig > Issue Type: Bug > Components: e2e harness >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.17.0 > > Attachments: PIG-4972-1.patch > > > Saw StreamingIO_1 on particular perl version due to a warning in > PigStreaming.pl. You can see the warning in any version of perl using "perl > -w": > {code} > defined(%hash) is deprecated at streaming/PigStreaming.pl line 76. > (Maybe you should just omit the defined()?) > {code} > In some particular version of perl, warning check is mandatory and the perl > script just fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-1472) Optimize serialization/deserialization between Map and Reduce and between MR jobs
[ https://issues.apache.org/jira/browse/PIG-1472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14694026#comment-14694026 ] Thejas M Nair commented on PIG-1472: I don't remember if I had looked into WritableUtils.writeVInt back then or if it was available with the pig version being used back then (its been 5 years! :) ) Would using WritableUtils.writeVInt mean that an extra byte needs to be used for storing the type ? ie bag vs map vs tuple .. For complex types, savings are more noticeable for smaller sizes. For a bag of size 32768, one byte saving won't be significant. However, for an int of size 32768 , the saving of one byte is significant. > Optimize serialization/deserialization between Map and Reduce and between MR > jobs > - > > Key: PIG-1472 > URL: https://issues.apache.org/jira/browse/PIG-1472 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.8.0 >Reporter: Thejas M Nair >Assignee: Thejas M Nair > Fix For: 0.8.0 > > Attachments: PIG-1472.2.patch, PIG-1472.3.patch, PIG-1472.4.patch, > PIG-1472.patch > > > In certain types of pig queries most of the execution time is spent in > serializing/deserializing (sedes) records between Map and Reduce and between > MR jobs. > For example, if PigMix queries are modified to specify types for all the > fields in the load statement schema, some of the queries (L2,L3,L9, L10 in > pigmix v1) that have records with bags and maps being transmitted across map > or reduce boundaries run a lot longer (runtime increase of few times has been > seen. > There are a few optimizations that have shown to improve the performance of > sedes in my tests - > 1. Use smaller number of bytes to store length of the column . For example if > a bytearray is smaller than 255 bytes , a byte can be used to store the > length instead of the integer that is currently used. > 2. Instead of custom code to do sedes on Strings, use DataOutput.writeUTF and > DataInput.readUTF. This reduces the cost of serialization by more than 1/2. > Zebra and BinStorage are known to use DefaultTuple sedes functionality. The > serialization format that these loaders use cannot change, so after the > optimization their format is going to be different from the format used > between M/R boundaries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4624) Error on ORC empty file without schema
[ https://issues.apache.org/jira/browse/PIG-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-4624: --- Attachment: PIG-4624.1.patch > Error on ORC empty file without schema > -- > > Key: PIG-4624 > URL: https://issues.apache.org/jira/browse/PIG-4624 > Project: Pig > Issue Type: Bug >Affects Versions: 0.15.0 >Reporter: Thejas M Nair >Assignee: Daniel Dai > Fix For: 0.16.0, 0.15.1 > > Attachments: PIG-4624.1.patch > > > If ORC produces an empty file without schema (which ideally, it is not > supposed to), then pig query reading the data gives the following error - > "org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable schema" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4624) Error on ORC empty file without schema
[ https://issues.apache.org/jira/browse/PIG-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-4624: --- Attachment: (was: PIG-4624.1.patch) > Error on ORC empty file without schema > -- > > Key: PIG-4624 > URL: https://issues.apache.org/jira/browse/PIG-4624 > Project: Pig > Issue Type: Bug >Affects Versions: 0.15.0 >Reporter: Thejas M Nair >Assignee: Daniel Dai > Fix For: 0.16.0, 0.15.1 > > Attachments: PIG-4624.1.patch > > > If ORC produces an empty file without schema (which ideally, it is not > supposed to), then pig query reading the data gives the following error - > "org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable schema" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4624) Error on ORC empty file without schema
[ https://issues.apache.org/jira/browse/PIG-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-4624: --- Attachment: (was: PIG-4624.1.patch) > Error on ORC empty file without schema > -- > > Key: PIG-4624 > URL: https://issues.apache.org/jira/browse/PIG-4624 > Project: Pig > Issue Type: Bug >Affects Versions: 0.15.0 >Reporter: Thejas M Nair >Assignee: Daniel Dai > Fix For: 0.16.0, 0.15.1 > > Attachments: PIG-4624.1.patch > > > If ORC produces an empty file without schema (which ideally, it is not > supposed to), then pig query reading the data gives the following error - > "org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable schema" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4624) Error on ORC empty file without schema
[ https://issues.apache.org/jira/browse/PIG-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-4624: --- Attachment: PIG-4624.1.patch > Error on ORC empty file without schema > -- > > Key: PIG-4624 > URL: https://issues.apache.org/jira/browse/PIG-4624 > Project: Pig > Issue Type: Bug >Affects Versions: 0.15.0 >Reporter: Thejas M Nair >Assignee: Daniel Dai > Fix For: 0.16.0, 0.15.1 > > Attachments: PIG-4624.1.patch > > > If ORC produces an empty file without schema (which ideally, it is not > supposed to), then pig query reading the data gives the following error - > "org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable schema" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4624) Error on ORC empty file without schema
[ https://issues.apache.org/jira/browse/PIG-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-4624: --- Summary: Error on ORC empty file without schema (was: pig errors out on ORC empty file without schema) > Error on ORC empty file without schema > -- > > Key: PIG-4624 > URL: https://issues.apache.org/jira/browse/PIG-4624 > Project: Pig > Issue Type: Bug >Affects Versions: 0.15.0 >Reporter: Thejas M Nair >Assignee: Daniel Dai > Fix For: 0.16.0, 0.15.1 > > Attachments: PIG-4624.1.patch > > > If ORC produces an empty file without schema (which ideally, it is not > supposed to), then pig query reading the data gives the following error - > "org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable schema" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4624) pig errors out on ORC empty file without schema
[ https://issues.apache.org/jira/browse/PIG-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-4624: --- Attachment: PIG-4624.1.patch The ORC issue should be separately addressed in ORC/Hive, however, it would be good if pig can handle this case with already generated files. Attaching patch from [~daijy]. > pig errors out on ORC empty file without schema > --- > > Key: PIG-4624 > URL: https://issues.apache.org/jira/browse/PIG-4624 > Project: Pig > Issue Type: Bug >Affects Versions: 0.15.0 >Reporter: Thejas M Nair >Assignee: Daniel Dai > Fix For: 0.16.0, 0.15.1 > > Attachments: PIG-4624.1.patch > > > If ORC produces an empty file without schema (which ideally, it is not > supposed to), then pig query reading the data gives the following error - > "org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable schema" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4624) pig errors out on ORC empty file without schema
[ https://issues.apache.org/jira/browse/PIG-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-4624: --- Fix Version/s: 0.15.1 0.16.0 > pig errors out on ORC empty file without schema > --- > > Key: PIG-4624 > URL: https://issues.apache.org/jira/browse/PIG-4624 > Project: Pig > Issue Type: Bug >Affects Versions: 0.15.0 >Reporter: Thejas M Nair >Assignee: Daniel Dai > Fix For: 0.16.0, 0.15.1 > > > If ORC produces an empty file without schema (which ideally, it is not > supposed to), then pig query reading the data gives the following error - > "org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable schema" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PIG-4624) pig errors out on ORC empty file without schema
Thejas M Nair created PIG-4624: -- Summary: pig errors out on ORC empty file without schema Key: PIG-4624 URL: https://issues.apache.org/jira/browse/PIG-4624 Project: Pig Issue Type: Bug Affects Versions: 0.15.0 Reporter: Thejas M Nair Assignee: Daniel Dai If ORC produces an empty file without schema (which ideally, it is not supposed to), then pig query reading the data gives the following error - "org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable schema" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4556) Local mode is broken in some case by PIG-4247
[ https://issues.apache.org/jira/browse/PIG-4556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14555165#comment-14555165 ] Thejas M Nair commented on PIG-4556: I believe s3 is used in non-local modes as well. Change looks good to me. (for my reference, since it took me few mins to figure out, the real change is in order in which params are passed to ConfigurationUtil.mergeConf). > Local mode is broken in some case by PIG-4247 > - > > Key: PIG-4556 > URL: https://issues.apache.org/jira/browse/PIG-4556 > Project: Pig > Issue Type: Bug >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.15.0 > > Attachments: PIG-4556-1.patch, PIG-4556-2.patch > > > HExecutionEngine.getS3Conf is wrong. It should only return s3 config. > Currently it will return all the properties, including *-site.xml even in > local mode. In one particular case, mapred-site.xml contains > "mapreduce.application.framework.path", this will going to the local mode > config, thus we see the exception: > {code} > Message: java.io.FileNotFoundException: File > file:/hdp/apps/2.2.0.0-2041/mapreduce/mapreduce.tar.gz does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524) > at > org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:111) > at > org.apache.hadoop.fs.AbstractFileSystem.resolvePath(AbstractFileSystem.java:460) > at org.apache.hadoop.fs.FilterFs.resolvePath(FilterFs.java:157) > at org.apache.hadoop.fs.FileContext$24.next(FileContext.java:2137) > at org.apache.hadoop.fs.FileContext$24.next(FileContext.java:2133) > at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) > at org.apache.hadoop.fs.FileContext.resolve(FileContext.java:2133) > at org.apache.hadoop.fs.FileContext.resolvePath(FileContext.java:595) > at > org.apache.hadoop.mapreduce.JobSubmitter.addMRFrameworkToDistributedCache(JobSubmitter.java:753) > at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:435) > at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296) > at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293) > at > org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128) > at > org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:194) > at java.lang.Thread.run(Thread.java:745) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:276) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4514) pig trunk compilation is broken - VertexManagerPluginContext.reconfigureVertex change
[ https://issues.apache.org/jira/browse/PIG-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-4514: --- Attachment: PIG-4514.1.patch > pig trunk compilation is broken - > VertexManagerPluginContext.reconfigureVertex change > - > > Key: PIG-4514 > URL: https://issues.apache.org/jira/browse/PIG-4514 > Project: Pig > Issue Type: Bug >Affects Versions: 0.15.0 >Reporter: Thejas M Nair >Assignee: Thejas M Nair > Fix For: 0.15.0 > > Attachments: PIG-4514.1.patch > > > {code} > src/org/apache/pig/backend/hadoop/executionengine/tez/runtime/PigGraceShuffleVertexManager.java:173: > error: exception TezException is never thrown in body of corresponding try > statement > [javac] } catch (TezException e) { > [javac] ^ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PIG-4514) pig trunk compilation is broken - VertexManagerPluginContext.reconfigureVertex change
Thejas M Nair created PIG-4514: -- Summary: pig trunk compilation is broken - VertexManagerPluginContext.reconfigureVertex change Key: PIG-4514 URL: https://issues.apache.org/jira/browse/PIG-4514 Project: Pig Issue Type: Bug Affects Versions: 0.15.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.15.0 {code} src/org/apache/pig/backend/hadoop/executionengine/tez/runtime/PigGraceShuffleVertexManager.java:173: error: exception TezException is never thrown in body of corresponding try statement [javac] } catch (TezException e) { [javac] ^ {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4509) [Pig on Tez] Unassigned applications not killed on shutdown
[ https://issues.apache.org/jira/browse/PIG-4509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498812#comment-14498812 ] Thejas M Nair commented on PIG-4509: +1 The change looks good to me. Thanks Rohini! > [Pig on Tez] Unassigned applications not killed on shutdown > --- > > Key: PIG-4509 > URL: https://issues.apache.org/jira/browse/PIG-4509 > Project: Pig > Issue Type: Bug >Affects Versions: 0.14.0 >Reporter: Rohini Palaniswamy >Assignee: Rohini Palaniswamy > Fix For: 0.15.0 > > Attachments: PIG-4509-1.patch, PIG-4509-FixCompileError.patch > > > tezclient.stop() should be called when tezClient.waitTillReady() is > interrupted on shutdown. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4509) [Pig on Tez] Unassigned applications not killed on shutdown
[ https://issues.apache.org/jira/browse/PIG-4509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498787#comment-14498787 ] Thejas M Nair commented on PIG-4509: It builds fine on my mac as well with jdk 7. However, it is failing with jdk7 in our internal build environment as well (probably linux). The fact that it passes in some setups is certainly very strange. I think we should still go ahead and fix this, as far as i know this should result in a syntax error. > [Pig on Tez] Unassigned applications not killed on shutdown > --- > > Key: PIG-4509 > URL: https://issues.apache.org/jira/browse/PIG-4509 > Project: Pig > Issue Type: Bug >Affects Versions: 0.14.0 >Reporter: Rohini Palaniswamy >Assignee: Rohini Palaniswamy > Fix For: 0.15.0 > > Attachments: PIG-4509-1.patch > > > tezclient.stop() should be called when tezClient.waitTillReady() is > interrupted on shutdown. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4509) [Pig on Tez] Unassigned applications not killed on shutdown
[ https://issues.apache.org/jira/browse/PIG-4509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498325#comment-14498325 ] Thejas M Nair commented on PIG-4509: [~rohini] This results in a compilation failure. {code} src/org/apache/pig/backend/hadoop/executionengine/tez/TezSessionManager.java:105: error: unreported exception Throwable; must be caught or declared to be thrown [javac] throw e; [javac] ^ {code} > [Pig on Tez] Unassigned applications not killed on shutdown > --- > > Key: PIG-4509 > URL: https://issues.apache.org/jira/browse/PIG-4509 > Project: Pig > Issue Type: Bug >Affects Versions: 0.14.0 >Reporter: Rohini Palaniswamy >Assignee: Rohini Palaniswamy > Fix For: 0.15.0 > > Attachments: PIG-4509-1.patch > > > tezclient.stop() should be called when tezClient.waitTillReady() is > interrupted on shutdown. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PIG-4486) set Tez ACLs appropriately in hive
Thejas M Nair created PIG-4486: -- Summary: set Tez ACLs appropriately in hive Key: PIG-4486 URL: https://issues.apache.org/jira/browse/PIG-4486 Project: Pig Issue Type: Bug Reporter: Thejas M Nair Hive should make the necessary changes to integrate with Tez and Timeline. It should pass the necessary ACL related params to ensure that query execution + logs is only visible to the relevant users. Proposed Changes - Set session level tez ACL for a super user, to allow modify + view Set DAG level ACL for user running the query (the end user), to allow modify + view Determining the super user - Super user can be configured using using hive.tez.admin.user. This can be initialized by Authorization implementation (such as sql standard authorization) if it is not already set to a specific value. SQL standard authorization would initialize if it is unset to the sql standard admin user. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4331) update README, '-x' option in usage to include tez
[ https://issues.apache.org/jira/browse/PIG-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-4331: --- Status: Patch Available (was: Open) > update README, '-x' option in usage to include tez > -- > > Key: PIG-4331 > URL: https://issues.apache.org/jira/browse/PIG-4331 > Project: Pig > Issue Type: Bug >Affects Versions: 0.14.0 >Reporter: Thejas M Nair >Assignee: Thejas M Nair > Fix For: 0.15.0 > > Attachments: PIG-4331.1.patch > > > Pig queries can be run using tez, by specifying "pig -x tez". The output of > pig --help needs to be updated to indicate that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4331) update README, '-x' option in usage to include tez
[ https://issues.apache.org/jira/browse/PIG-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-4331: --- Attachment: PIG-4331.1.patch > update README, '-x' option in usage to include tez > -- > > Key: PIG-4331 > URL: https://issues.apache.org/jira/browse/PIG-4331 > Project: Pig > Issue Type: Bug >Affects Versions: 0.14.0 >Reporter: Thejas M Nair >Assignee: Thejas M Nair > Fix For: 0.15.0 > > Attachments: PIG-4331.1.patch > > > Pig queries can be run using tez, by specifying "pig -x tez". The output of > pig --help needs to be updated to indicate that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PIG-4331) update README, '-x' option in usage to include tez
Thejas M Nair created PIG-4331: -- Summary: update README, '-x' option in usage to include tez Key: PIG-4331 URL: https://issues.apache.org/jira/browse/PIG-4331 Project: Pig Issue Type: Bug Affects Versions: 0.14.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.15.0 Pig queries can be run using tez, by specifying "pig -x tez". But usage does not indicate this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4331) update README, '-x' option in usage to include tez
[ https://issues.apache.org/jira/browse/PIG-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-4331: --- Description: Pig queries can be run using tez, by specifying "pig -x tez". The output of pig --help needs to be updated to indicate that. (was: Pig queries can be run using tez, by specifying "pig -x tez". But usage does not indicate this. ) > update README, '-x' option in usage to include tez > -- > > Key: PIG-4331 > URL: https://issues.apache.org/jira/browse/PIG-4331 > Project: Pig > Issue Type: Bug >Affects Versions: 0.14.0 >Reporter: Thejas M Nair >Assignee: Thejas M Nair > Fix For: 0.15.0 > > > Pig queries can be run using tez, by specifying "pig -x tez". The output of > pig --help needs to be updated to indicate that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4328) Upgrade Hive to 0.14
[ https://issues.apache.org/jira/browse/PIG-4328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14209090#comment-14209090 ] Thejas M Nair commented on PIG-4328: +1 > Upgrade Hive to 0.14 > > > Key: PIG-4328 > URL: https://issues.apache.org/jira/browse/PIG-4328 > Project: Pig > Issue Type: Bug >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.14.0 > > Attachments: PIG-4328-1.patch > > > Hive 0.14.0 artifacts are available. We shall switch to use the released > version. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4250) Fix Security Risks found by Coverity
[ https://issues.apache.org/jira/browse/PIG-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14191114#comment-14191114 ] Thejas M Nair commented on PIG-4250: blocks such as following aren't necessary - {code} } catch (IOException e) { throw e } {code} You can use Hadoop IOUtils.closeStream or cleanup to call close. You don't need the addtional try-catch and null check with that. > Fix Security Risks found by Coverity > > > Key: PIG-4250 > URL: https://issues.apache.org/jira/browse/PIG-4250 > Project: Pig > Issue Type: Bug > Components: impl >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.14.0 > > Attachments: PIG-4250-1.patch, PIG-4250-2.patch > > > Here is the report: https://scan.coverity.com/projects/3026 (Need to register > to see). Most belong to one pattern: not close stream when exception happens. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4160) Provide a way to pass local jars in pig.additional.jars when using a remote url for a script
[ https://issues.apache.org/jira/browse/PIG-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14190848#comment-14190848 ] Thejas M Nair commented on PIG-4160: +1 > Provide a way to pass local jars in pig.additional.jars when using a remote > url for a script > > > Key: PIG-4160 > URL: https://issues.apache.org/jira/browse/PIG-4160 > Project: Pig > Issue Type: Improvement > Components: impl >Affects Versions: 0.14.0 >Reporter: Andrew C. Oliver > Labels: patch > Fix For: 0.14.0 > > Attachments: PIG-4160-2.patch, PIG-4160-3.patch, PIG-4160-4.patch, > forcelocal.trunk.patch, forcelocal.withtests.patch > > Original Estimate: 3h > Remaining Estimate: 3h > > patch adds a -j/forcelocaljars flag which if enabled allows you to do > pig -j -useHCatalog hdfs://myserver:8020/load/scripts/mydir/myscript.pig > thus loading the pig script REMOTELY > while loading the jar files LOCALLY > One does this to avoid a single point of failure but avoid one central > interversion dependent repository for all the jars across all teams/projects. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4160) -forcelocaljars / -j flag when using a remote url for a script
[ https://issues.apache.org/jira/browse/PIG-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14190819#comment-14190819 ] Thejas M Nair commented on PIG-4160: I think it would be useful to retain mention of the old additional.jars parameter in the web docs, Saying that it is similar to .jars.comma but separated by the OS path charactor and does not allow for scheme to be specified (and that the use of jars.comma is preferred) > -forcelocaljars / -j flag when using a remote url for a script > -- > > Key: PIG-4160 > URL: https://issues.apache.org/jira/browse/PIG-4160 > Project: Pig > Issue Type: Improvement > Components: impl >Affects Versions: 0.14.0 >Reporter: Andrew C. Oliver > Labels: patch > Fix For: 0.14.0 > > Attachments: PIG-4160-2.patch, PIG-4160-3.patch, > forcelocal.trunk.patch, forcelocal.withtests.patch > > Original Estimate: 3h > Remaining Estimate: 3h > > patch adds a -j/forcelocaljars flag which if enabled allows you to do > pig -j -useHCatalog hdfs://myserver:8020/load/scripts/mydir/myscript.pig > thus loading the pig script REMOTELY > while loading the jar files LOCALLY > One does this to avoid a single point of failure but avoid one central > interversion dependent repository for all the jars across all teams/projects. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4160) -forcelocaljars / -j flag when using a remote url for a script
[ https://issues.apache.org/jira/browse/PIG-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14190821#comment-14190821 ] Thejas M Nair commented on PIG-4160: I think we should also change the jira title and description to indicate what is being implemented. > -forcelocaljars / -j flag when using a remote url for a script > -- > > Key: PIG-4160 > URL: https://issues.apache.org/jira/browse/PIG-4160 > Project: Pig > Issue Type: Improvement > Components: impl >Affects Versions: 0.14.0 >Reporter: Andrew C. Oliver > Labels: patch > Fix For: 0.14.0 > > Attachments: PIG-4160-2.patch, PIG-4160-3.patch, > forcelocal.trunk.patch, forcelocal.withtests.patch > > Original Estimate: 3h > Remaining Estimate: 3h > > patch adds a -j/forcelocaljars flag which if enabled allows you to do > pig -j -useHCatalog hdfs://myserver:8020/load/scripts/mydir/myscript.pig > thus loading the pig script REMOTELY > while loading the jar files LOCALLY > One does this to avoid a single point of failure but avoid one central > interversion dependent repository for all the jars across all teams/projects. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4160) -forcelocaljars / -j flag when using a remote url for a script
[ https://issues.apache.org/jira/browse/PIG-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14190713#comment-14190713 ] Thejas M Nair commented on PIG-4160: Daniel, can you also include the doc edits ? Also the commented line "#HADOOP_OPTS" can be deleted. > -forcelocaljars / -j flag when using a remote url for a script > -- > > Key: PIG-4160 > URL: https://issues.apache.org/jira/browse/PIG-4160 > Project: Pig > Issue Type: Improvement > Components: impl >Affects Versions: 0.14.0 >Reporter: Andrew C. Oliver > Labels: patch > Fix For: 0.14.0 > > Attachments: PIG-4160-2.patch, forcelocal.trunk.patch, > forcelocal.withtests.patch > > Original Estimate: 3h > Remaining Estimate: 3h > > patch adds a -j/forcelocaljars flag which if enabled allows you to do > pig -j -useHCatalog hdfs://myserver:8020/load/scripts/mydir/myscript.pig > thus loading the pig script REMOTELY > while loading the jar files LOCALLY > One does this to avoid a single point of failure but avoid one central > interversion dependent repository for all the jars across all teams/projects. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4151) Pig Cannot Write Empty Maps to HBase
[ https://issues.apache.org/jira/browse/PIG-4151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171925#comment-14171925 ] Thejas M Nair commented on PIG-4151: +1 > Pig Cannot Write Empty Maps to HBase > > > Key: PIG-4151 > URL: https://issues.apache.org/jira/browse/PIG-4151 > Project: Pig > Issue Type: Bug > Components: internal-udfs >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.14.0 > > Attachments: PIG-4151-1.patch > > > Pig is unable to write empty maps to HBase. Instruction for reproduce: > input file pig_data_bad.txt: > {code} > row1;Homer;Morrison;[1#Silvia,2#Stacy] > row2;Sheila;Fletcher;[1#Becky,2#Salvador,3#Lois] > row4;Andre;Morton;[1#Nancy] > row3;Sonja;Webb;[] > {code} > Create table in hbase: > create 'test', 'info', 'friends' > Pig script: > {code} > source = LOAD '/pig_data_bad.txt' USING PigStorage(';') AS (row:chararray, > first_name:chararray, last_name:chararray, friends:map[]); > STORE source INTO 'hbase://test' USING > org.apache.pig.backend.hadoop.hbase.HBaseStorage('info:fname info:lname > friends:*'); > {code} > Stack: > java.lang.NullPointerException > at > org.apache.pig.backend.hadoop.hbase.HBaseStorage.putNext(HBaseStorage.java:880) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98) > at > org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:635) > at > org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) > at > org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:284) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4141) Ship UDF/LoadFunc/StoreFunc dependent jar automatically
[ https://issues.apache.org/jira/browse/PIG-4141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14136229#comment-14136229 ] Thejas M Nair commented on PIG-4141: +1 > Ship UDF/LoadFunc/StoreFunc dependent jar automatically > --- > > Key: PIG-4141 > URL: https://issues.apache.org/jira/browse/PIG-4141 > Project: Pig > Issue Type: New Feature > Components: impl >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.14.0 > > Attachments: PIG-4141-1.patch, PIG-4141-2.patch, PIG-4141-3.patch, > PIG-4141-4.patch, PIG-4141-5.patch > > > When user use AvroStorage/JsonStorage/OrcStorage, they need to register > dependent jars manually. It would be much convenient if we can provide a > mechanism for UDF/LoadFunc/StoreFunc to claim the dependency and ship jars > automatically. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4128) New logical optimizer rule: ConstantCalculator
[ https://issues.apache.org/jira/browse/PIG-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14111505#comment-14111505 ] Thejas M Nair commented on PIG-4128: +1 > New logical optimizer rule: ConstantCalculator > -- > > Key: PIG-4128 > URL: https://issues.apache.org/jira/browse/PIG-4128 > Project: Pig > Issue Type: New Feature > Components: impl >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.14.0 > > Attachments: PIG-4128-1.patch, PIG-4128-2.patch, PIG-4128-3.patch > > > Pig used to have a LogicExpressionSimplifier to simplify expression which > also calculates constant expression. The optimizer rule is buggy and we > disable it by default in PIG-2316. > However, we do need this feature especially in partition/predicate push down, > since both does not deal with complex constant expression, we'd like to > replace the expression with constant before the actual push down. Yes, user > may manually do the calculation and rewrite the query, but even rewrite is > sometimes not possible. Consider the case user want to push a datetime > predicate, user have to write a ToDate udf since Pig does not have datetime > constant. > In this Jira, I provide a new rule: ConstantCalculator, which is much simpler > and much less error prone, to replace LogicExpressionSimplifier. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PIG-3522) Remove shock from pig
[ https://issues.apache.org/jira/browse/PIG-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13811774#comment-13811774 ] Thejas M Nair commented on PIG-3522: +1 > Remove shock from pig > - > > Key: PIG-3522 > URL: https://issues.apache.org/jira/browse/PIG-3522 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.13.0 > > Attachments: PIG-3522-1.patch > > > It is only used in very ancient Hadoop which uses HOD as resource manager. > Current Pig code does not use it. This include the entire lib-src/shock > directory and jsch.jar -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (PIG-3503) More document for Pig 0.12 new features
[ https://issues.apache.org/jira/browse/PIG-3503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13787319#comment-13787319 ] Thejas M Nair commented on PIG-3503: Everything else looks good. You can commit after the changes. > More document for Pig 0.12 new features > --- > > Key: PIG-3503 > URL: https://issues.apache.org/jira/browse/PIG-3503 > Project: Pig > Issue Type: Improvement > Components: documentation >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.12.0 > > Attachments: PIG-3503-1.patch > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (PIG-3503) More document for Pig 0.12 new features
[ https://issues.apache.org/jira/browse/PIG-3503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13787318#comment-13787318 ] Thejas M Nair commented on PIG-3503: "If use set command without providing key/value pair, Pig print all the configurations and all system properties. " can be changed to "If set command is used without key/value pair argument, Pig prints all the configurations and system properties." In perf.xml In the example, should we use a load function that supports partition filter pushdown ? Otherwise, people might expect it to work with PigStorage. Also, should the example in it without the filter statement be removed ? {code} + +A = LOAD 'input' as (dt, state, event); + {code} > More document for Pig 0.12 new features > --- > > Key: PIG-3503 > URL: https://issues.apache.org/jira/browse/PIG-3503 > Project: Pig > Issue Type: Improvement > Components: documentation >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.12.0 > > Attachments: PIG-3503-1.patch > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (PIG-3360) Some intermittent negative e2e tests fail on hadoop 2
[ https://issues.apache.org/jira/browse/PIG-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776935#comment-13776935 ] Thejas M Nair commented on PIG-3360: Looks good. +1 > Some intermittent negative e2e tests fail on hadoop 2 > - > > Key: PIG-3360 > URL: https://issues.apache.org/jira/browse/PIG-3360 > Project: Pig > Issue Type: Bug > Components: impl >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.12.0 > > Attachments: PIG-3360-1.patch, PIG-3360-2.patch > > > One example is StreamingErrors_2. Here is the stack we get: > Backend error message > - > Error: org.apache.pig.backend.executionengine.ExecException: ERROR 2055: > Received Error while processing the map plan: 'perl PigStreamingBad.pl middle > ' failed with exit status: 2 > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:311) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1477) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157) > Pig Stack Trace > --- > ERROR 2244: Job failed, hadoop does not return any error message > org.apache.pig.backend.executionengine.ExecException: ERROR 2244: Job failed, > hadoop does not return any error message > at > org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:145) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170) > at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84) > at org.apache.pig.Main.run(Main.java:604) > at org.apache.pig.Main.main(Main.java:157) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3259) Optimize byte to Long/Integer conversions
[ https://issues.apache.org/jira/browse/PIG-3259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13614137#comment-13614137 ] Thejas M Nair commented on PIG-3259: bq. How do we determine the number of non-numbers without making calls to sanityCheck..()? By counting the number of times exception has so far been thrown by .valueOf(). Once a threshold has been crossed, we can introduce the sanity check for each new value. This will put a limit on worst ('incorrect') case performance without degrading the 'correct' case performance by much. I wonder if there are good libraries that we can use for the sanity checks, as the decimal check seems bit more complicated . > Optimize byte to Long/Integer conversions > - > > Key: PIG-3259 > URL: https://issues.apache.org/jira/browse/PIG-3259 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11, 0.11.1 >Reporter: Prashant Kommireddi >Assignee: Prashant Kommireddi > Fix For: 0.12 > > Attachments: byteToLong.xlsx > > > These conversions can be performing better. If the input is not numeric > (1234abcd) the code calls Double.valueOf(String) regardless before finally > returning null. Any script that inadvertently (user's mistake or not) tries > to cast non-numeric column to int or long would result in many wasteful > calls. > We can avoid this and only handle the cases we find the input to be a decimal > number (1234.56) and return null otherwise even before trying > Double.valueOf(String). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3259) Optimize byte to Long/Integer conversions
[ https://issues.apache.org/jira/browse/PIG-3259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13613354#comment-13613354 ] Thejas M Nair commented on PIG-3259: Sounds like a good idea. The check you have here does not accept all valid double string representations (See http://docs.oracle.com/javase/6/docs/api/java/lang/Double.html#valueOf(java.lang.String) ) . (eg with exponent, or hexadecimal representation starting with 0x). But if we can avoid the performance degradation for the 'correct' [1] case (which seems to be be in range of 2-8% in the micro benchmark that ran for at least few seconds), that would be better. One way to avoid performance degradation for 'correct' case would be to start by doing .valueOf() without checks, then use the number of non-numbers encountered to decide if want to be making the sanityCheckIntegerLongDecimal() calls. [1] - by correct I mean the case where the field declared an integer or a double has correct representation. > Optimize byte to Long/Integer conversions > - > > Key: PIG-3259 > URL: https://issues.apache.org/jira/browse/PIG-3259 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11, 0.11.1 >Reporter: Prashant Kommireddi >Assignee: Prashant Kommireddi > Fix For: 0.12 > > Attachments: byteToLong.xlsx > > > These conversions can be performing better. If the input is not numeric > (1234abcd) the code calls Double.valueOf(String) regardless before finally > returning null. Any script that inadvertently (user's mistake or not) tries > to cast non-numeric column to int or long would result in many wasteful > calls. > We can avoid this and only handle the cases we find the input to be a decimal > number (1234.56) and return null otherwise even before trying > Double.valueOf(String). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3248) Upgrade hadoop-2.0.0-alpha to hadoop-2.0.3-alpha
[ https://issues.apache.org/jira/browse/PIG-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13605615#comment-13605615 ] Thejas M Nair commented on PIG-3248: +1 > Upgrade hadoop-2.0.0-alpha to hadoop-2.0.3-alpha > > > Key: PIG-3248 > URL: https://issues.apache.org/jira/browse/PIG-3248 > Project: Pig > Issue Type: Improvement >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.12 > > Attachments: PIG-3248-1.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3248) Upgrade hadoop-2.0.0-alpha to hadoop-2.0.3-alpha
[ https://issues.apache.org/jira/browse/PIG-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13605584#comment-13605584 ] Thejas M Nair commented on PIG-3248: bq. This is actually a different issue. TestPigSplit always fail on my test machine due to stack overflow. It is Ok if you want me open a separate Jira for this issue. As it is a very minor change to a test case, i think its fine to include it here. 200 is large enough for number of statements, so i think so many levels instead of 500 should be ok, if it is causing issues with some jvm configs. > Upgrade hadoop-2.0.0-alpha to hadoop-2.0.3-alpha > > > Key: PIG-3248 > URL: https://issues.apache.org/jira/browse/PIG-3248 > Project: Pig > Issue Type: Improvement >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.12 > > Attachments: PIG-3248-1.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3248) Upgrade hadoop-2.0.0-alpha to hadoop-2.0.3-alpha
[ https://issues.apache.org/jira/browse/PIG-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13605460#comment-13605460 ] Thejas M Nair commented on PIG-3248: Daniel, Is the following change relevant for the hadoop 2 version upgrade ? {code} === --- test/org/apache/pig/test/TestPigSplit.java (revision 1456106) +++ test/org/apache/pig/test/TestPigSplit.java (working copy) @@ -108,7 +108,7 @@ createInput(new String[] { "0\ta" }); pigServer.registerQuery("a = load '" + inputFileName + "';"); -for (int i = 0; i < 500; i++) { +for (int i = 0; i < 200; i++) { pigServer.registerQuery("a = filter a by $0 == '1';"); } Iterator iter = pigServer.openIterator("a"); {code} > Upgrade hadoop-2.0.0-alpha to hadoop-2.0.3-alpha > > > Key: PIG-3248 > URL: https://issues.apache.org/jira/browse/PIG-3248 > Project: Pig > Issue Type: Improvement >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.12 > > Attachments: PIG-3248-1.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3214) New/improved mascot
[ https://issues.apache.org/jira/browse/PIG-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-3214: --- Attachment: pig_6_lc_g.JPG I think lower case 'g' in Julien's suggestion will make the tail attachment look better. Attaching pig_6_lc_g.JPG to illustrate what I am thinking. > New/improved mascot > --- > > Key: PIG-3214 > URL: https://issues.apache.org/jira/browse/PIG-3214 > Project: Pig > Issue Type: Wish > Components: site >Affects Versions: 0.11 >Reporter: Andrew Musselman >Priority: Minor > Fix For: 0.12 > > Attachments: apache-pig-14.png, apache-pig-yellow-logo.png, > newlogo1.png, newlogo2.png, newlogo3.png, newlogo4.png, newlogo5.png, > new_logo_7.png, pig_6.JPG, pig_6_lc_g.JPG, pig-logo-10.png, pig-logo-11.png, > pig-logo-12.png, pig-logo-13.png, pig-logo-8a.png, pig-logo-8b.png, > pig-logo-9a.png, pig-logo-9b.png, pig_logo_new.png > > > Request to change pig mascot to something more graphically appealing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3089) Implicit relation names
[ https://issues.apache.org/jira/browse/PIG-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13529125#comment-13529125 ] Thejas M Nair commented on PIG-3089: In my opinion, too many rules for implicit relation names would make pig scripts (written by others) hard to read, specially for people who are new to pig. I think it is better to just allow name of preceding relation to be referred using a special notation. > Implicit relation names > --- > > Key: PIG-3089 > URL: https://issues.apache.org/jira/browse/PIG-3089 > Project: Pig > Issue Type: New Feature > Components: grunt, parser >Reporter: Russell Jurney >Assignee: Jonathan Coveney > > A = load foo; > B = load bar; > filter A by id > 5; > join A_1 by id, B by id; > // or A_filter > foreach A_1_B generate id; > store into foobar; // A_1_B_1 or A_filter_B_generate > Or some such routine? > We don't have to be explicit no more! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3044) Trigger POPartialAgg compaction under GC pressure
[ https://issues.apache.org/jira/browse/PIG-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13526600#comment-13526600 ] Thejas M Nair commented on PIG-3044: bq. I would even say we remove the % memory budget as the Spillable mechanism is more reliable and much simpler. The reason why % memory budget was introduced for SelfSpillBag, was because the spillable mechanism didn't always work well. The cleanup often was getting triggered too late. So I think it is better use the Spillable mechanism here to spill earlier if necessary, as the patch is doing. > Trigger POPartialAgg compaction under GC pressure > - > > Key: PIG-3044 > URL: https://issues.apache.org/jira/browse/PIG-3044 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.10.0, 0.11, 0.10.1 >Reporter: Dmitriy V. Ryaboy >Assignee: Dmitriy V. Ryaboy > Fix For: 0.11, 0.12 > > Attachments: PIG-3044.2.diff, PIG-3404.diff > > > If partial aggregation is turned on in pig 10 and 11, 20% (by default) of the > available heap can be consumed by the POPartialAgg operator. This can cause > memory issues for jobs that use all, or nearly all, of the heap already. > If we make POPartialAgg "spillable" (trigger compaction when memory reduction > is required), we would be much nicer to high-memory jobs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3071) update hcatalog jar and path to hbase storage handler har
[ https://issues.apache.org/jira/browse/PIG-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-3071: --- Labels: hcatalog (was: ) > update hcatalog jar and path to hbase storage handler har > - > > Key: PIG-3071 > URL: https://issues.apache.org/jira/browse/PIG-3071 > Project: Pig > Issue Type: Bug >Reporter: Arpit Gupta > Labels: hcatalog > Attachments: PIG-3071.patch > > > Due to changes in hcatalog 0.5 packaging we need to update the hcatalog jar > name and the path to the hbase storage handler jar. > pig script should be updated to work with either version. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2176) add logical plan assumption checker
[ https://issues.apache.org/jira/browse/PIG-2176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-2176: --- Assignee: Thejas M Nair > add logical plan assumption checker > > > Key: PIG-2176 > URL: https://issues.apache.org/jira/browse/PIG-2176 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.9.0 >Reporter: Thejas M Nair >Assignee: Thejas M Nair > Fix For: 0.10.0 > > Attachments: PIG-2176.1.patch, PIG-2176.2.patch > > > Pig expects certain things about LogicalPlan, and optimizer logic depends on > those to be true. Could that verifies that these assumptions are true will > help in catching issues early on. > Some of the assumptions that should be checked - > 1. All schema have valid uid . (not -1). > 2. All fields in schema have distinct uid. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2959) Add a pig.cmd for Pig to run under Windows
[ https://issues.apache.org/jira/browse/PIG-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13495092#comment-13495092 ] Thejas M Nair commented on PIG-2959: +1 > Add a pig.cmd for Pig to run under Windows > -- > > Key: PIG-2959 > URL: https://issues.apache.org/jira/browse/PIG-2959 > Project: Pig > Issue Type: Sub-task >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.11 > > Attachments: pig.cmd > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2980) documentation for DateTime datatype
[ https://issues.apache.org/jira/browse/PIG-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494274#comment-13494274 ] Thejas M Nair commented on PIG-2980: bq. Yes, I mean ToDate('1970-01-01T00:00:00.000+00:00'). where users can specify a constant string to create a datetime object. Let me rephrase the description here. I think we can just remove datestamp from the constants table and add a note under the table, that users should use ToDate udf to generate datetime from string constants. > documentation for DateTime datatype > --- > > Key: PIG-2980 > URL: https://issues.apache.org/jira/browse/PIG-2980 > Project: Pig > Issue Type: Bug > Components: documentation >Reporter: Thejas M Nair >Assignee: Zhijie Shen > Fix For: 0.11 > > Attachments: PIG-2980.patch > > > Documentation for new DateTime type needs to be added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2980) documentation for DateTime datatype
[ https://issues.apache.org/jira/browse/PIG-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13492934#comment-13492934 ] Thejas M Nair commented on PIG-2980: Thanks for the patch Zhijie. It looks good. But it says that datestamp constants are supported, but I guess if you pass '1970-01-01T00:00:00.000+00:00' to pig (say as an argument to a udf), i believe it would get interpreted as a string . Ie, we support chararray constants that can be cast to datetime, but not a datetime constant per se. Is that correct ? (I think it makes sense to support datetime constants, using a format that does not cause ambiguity wrt chararray type. But that would be another jira). > documentation for DateTime datatype > --- > > Key: PIG-2980 > URL: https://issues.apache.org/jira/browse/PIG-2980 > Project: Pig > Issue Type: Bug > Components: documentation >Reporter: Thejas M Nair >Assignee: Zhijie Shen > Fix For: 0.11 > > Attachments: PIG-2980.patch > > > Documentation for new DateTime type needs to be added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2981) add e2e tests for DateTime data type
[ https://issues.apache.org/jira/browse/PIG-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13486695#comment-13486695 ] Thejas M Nair commented on PIG-2981: I will try to do it this week, and get it into 0.11 . But for now putting a fix version of 0.12 , just in case. > add e2e tests for DateTime data type > - > > Key: PIG-2981 > URL: https://issues.apache.org/jira/browse/PIG-2981 > Project: Pig > Issue Type: Test >Reporter: Thejas M Nair > Fix For: 0.12 > > > e2e tests for DateTime datatype need to be added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2434) investigate 5% slowdown in TPC-H Q6 query in 0.10
[ https://issues.apache.org/jira/browse/PIG-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13486693#comment-13486693 ] Thejas M Nair commented on PIG-2434: Unlinking from 0.11 . > investigate 5% slowdown in TPC-H Q6 query in 0.10 > - > > Key: PIG-2434 > URL: https://issues.apache.org/jira/browse/PIG-2434 > Project: Pig > Issue Type: Bug >Affects Versions: 0.10.0 >Reporter: Thejas M Nair > > 0.10 is slower than 0.9 by around 5% for TPC-H Q6 query as per observation in > https://issues.apache.org/jira/browse/PIG-2228?focusedCommentId=13171461&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13171461 > . > This needs to be investigated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2981) add e2e tests for DateTime data type
[ https://issues.apache.org/jira/browse/PIG-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-2981: --- Fix Version/s: (was: 0.11) 0.12 > add e2e tests for DateTime data type > - > > Key: PIG-2981 > URL: https://issues.apache.org/jira/browse/PIG-2981 > Project: Pig > Issue Type: Test >Reporter: Thejas M Nair > Fix For: 0.12 > > > e2e tests for DateTime datatype need to be added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2434) investigate 5% slowdown in TPC-H Q6 query in 0.10
[ https://issues.apache.org/jira/browse/PIG-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-2434: --- Fix Version/s: (was: 0.11) > investigate 5% slowdown in TPC-H Q6 query in 0.10 > - > > Key: PIG-2434 > URL: https://issues.apache.org/jira/browse/PIG-2434 > Project: Pig > Issue Type: Bug >Affects Versions: 0.10.0 >Reporter: Thejas M Nair > > 0.10 is slower than 0.9 by around 5% for TPC-H Q6 query as per observation in > https://issues.apache.org/jira/browse/PIG-2228?focusedCommentId=13171461&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13171461 > . > This needs to be investigated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-3007) support group-by collected for load funcs that don't implement CollectableLoadFunc
Thejas M Nair created PIG-3007: -- Summary: support group-by collected for load funcs that don't implement CollectableLoadFunc Key: PIG-3007 URL: https://issues.apache.org/jira/browse/PIG-3007 Project: Pig Issue Type: New Feature Reporter: Thejas M Nair group-by collected should be supported for all input that are sorted on group-by keys. To ensure that a map task gets all records for a group-key, indexing can be done to determine which key at which it should start processing , and if it should read from next split as well to get remaining records for the last group-by column in its original split. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2982) add unit tests for DateTime type that test setting timezone
[ https://issues.apache.org/jira/browse/PIG-2982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482047#comment-13482047 ] Thejas M Nair commented on PIG-2982: +1. Will commit after running tests. > add unit tests for DateTime type that test setting timezone > --- > > Key: PIG-2982 > URL: https://issues.apache.org/jira/browse/PIG-2982 > Project: Pig > Issue Type: Test >Reporter: Thejas M Nair >Assignee: Zhijie Shen > Fix For: 0.11 > > Attachments: PIG-2982.patch > > > The default timezone can be set for the new DateTime type. We need to add > unit tests that test this functionality. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2980) documentation for DateTime datatype
[ https://issues.apache.org/jira/browse/PIG-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481936#comment-13481936 ] Thejas M Nair commented on PIG-2980: Olga, Zhijie is planning to work on this. If you can help with the formatting, that would be great! > documentation for DateTime datatype > --- > > Key: PIG-2980 > URL: https://issues.apache.org/jira/browse/PIG-2980 > Project: Pig > Issue Type: Bug > Components: documentation >Reporter: Thejas M Nair > Fix For: 0.11 > > > Documentation for new DateTime type needs to be added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair resolved PIG-1314. Resolution: Fixed As Dmitriy suggested, closing this jira and opened new ones for remaining work - PIG-2980, PIG-2981, PIG-2982 . > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Zhijie Shen > Labels: gsoc2012 > Attachments: joda_vs_builtin.zip, PIG-1314-1.patch, PIG-1314-2.patch, > PIG-1314-3.patch, PIG-1314-4.patch, PIG-1314-5.patch, PIG-1314-6.patch, > PIG-1314-7.patch > > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-2982) add unit tests for DateTime type that test setting timezone
Thejas M Nair created PIG-2982: -- Summary: add unit tests for DateTime type that test setting timezone Key: PIG-2982 URL: https://issues.apache.org/jira/browse/PIG-2982 Project: Pig Issue Type: Test Reporter: Thejas M Nair Fix For: 0.11 The default timezone can be set for the new DateTime type. We need to add unit tests that test this functionality. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-2981) add e2e tests for DateTime data type
Thejas M Nair created PIG-2981: -- Summary: add e2e tests for DateTime data type Key: PIG-2981 URL: https://issues.apache.org/jira/browse/PIG-2981 Project: Pig Issue Type: Test Reporter: Thejas M Nair Fix For: 0.11 e2e tests for DateTime datatype need to be added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-2980) documentation for DateTime datatype
Thejas M Nair created PIG-2980: -- Summary: documentation for DateTime datatype Key: PIG-2980 URL: https://issues.apache.org/jira/browse/PIG-2980 Project: Pig Issue Type: Bug Components: documentation Reporter: Thejas M Nair Fix For: 0.11 Documentation for new DateTime type needs to be added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2910) Add function to read schema from outout of Schema.toString()
[ https://issues.apache.org/jira/browse/PIG-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-2910: --- Resolution: Fixed Status: Resolved (was: Patch Available) +1 Patch committed to trunk. Eli, Thanks for the patch! > Add function to read schema from outout of Schema.toString() > > > Key: PIG-2910 > URL: https://issues.apache.org/jira/browse/PIG-2910 > Project: Pig > Issue Type: Improvement > Components: impl, parser >Affects Versions: 0.9.2, 0.10.0, 0.11, 0.10.1 >Reporter: Russell Jurney >Assignee: Eli Reisman > Labels: newbie > Fix For: 0.11 > > Attachments: PIG-2910-1.patch, PIG-2910-2.patch, PIG-2910-3.patch, > PIG-2910-4.patch > > > I want to toString() schemas and send them to the backend via UDFContext. At > the moment this requires writing your own toString() method that > Utils.getSchemaFromString() can read. Making a readable schema for the > backend would be an improvement. > I spoke with Thejas, who believes this is a bug. The workaround for the > moment is, for example: > String schemaString = inputSchema.toString().substring(1, > inputSchema.toString().length() - 1); > // Set the input schema for processing > UDFContext context = UDFContext.getUDFContext(); > Properties udfProp = context.getUDFProperties(this.getClass()); > udfProp.setProperty("horton.json.udf.schema", schemaString); > ... > schema = Utils.getSchemaFromString(strSchema); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2910) Add function to read schema from outout of Schema.toString()
[ https://issues.apache.org/jira/browse/PIG-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-2910: --- Fix Version/s: (was: 0.10.1) Status: Patch Available (was: Open) > Add function to read schema from outout of Schema.toString() > > > Key: PIG-2910 > URL: https://issues.apache.org/jira/browse/PIG-2910 > Project: Pig > Issue Type: Improvement > Components: impl, parser >Affects Versions: 0.10.0, 0.9.2, 0.11, 0.10.1 >Reporter: Russell Jurney >Assignee: Eli Reisman > Labels: newbie > Fix For: 0.11 > > Attachments: PIG-2910-1.patch, PIG-2910-2.patch, PIG-2910-3.patch, > PIG-2910-4.patch > > > I want to toString() schemas and send them to the backend via UDFContext. At > the moment this requires writing your own toString() method that > Utils.getSchemaFromString() can read. Making a readable schema for the > backend would be an improvement. > I spoke with Thejas, who believes this is a bug. The workaround for the > moment is, for example: > String schemaString = inputSchema.toString().substring(1, > inputSchema.toString().length() - 1); > // Set the input schema for processing > UDFContext context = UDFContext.getUDFContext(); > Properties udfProp = context.getUDFProperties(this.getClass()); > udfProp.setProperty("horton.json.udf.schema", schemaString); > ... > schema = Utils.getSchemaFromString(strSchema); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2910) Add function to read schema from outout of Schema.toString()
[ https://issues.apache.org/jira/browse/PIG-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-2910: --- Summary: Add function to read schema from outout of Schema.toString() (was: Make toString() methods on Schema and FieldSchema be readable by Utils.getSchemaFromString()) > Add function to read schema from outout of Schema.toString() > > > Key: PIG-2910 > URL: https://issues.apache.org/jira/browse/PIG-2910 > Project: Pig > Issue Type: Improvement > Components: impl, parser >Affects Versions: 0.9.2, 0.10.0, 0.11, 0.10.1 >Reporter: Russell Jurney >Assignee: Eli Reisman > Labels: newbie > Fix For: 0.11, 0.10.1 > > Attachments: PIG-2910-1.patch, PIG-2910-2.patch, PIG-2910-3.patch, > PIG-2910-4.patch > > > I want to toString() schemas and send them to the backend via UDFContext. At > the moment this requires writing your own toString() method that > Utils.getSchemaFromString() can read. Making a readable schema for the > backend would be an improvement. > I spoke with Thejas, who believes this is a bug. The workaround for the > moment is, for example: > String schemaString = inputSchema.toString().substring(1, > inputSchema.toString().length() - 1); > // Set the input schema for processing > UDFContext context = UDFContext.getUDFContext(); > Properties udfProp = context.getUDFProperties(this.getClass()); > udfProp.setProperty("horton.json.udf.schema", schemaString); > ... > schema = Utils.getSchemaFromString(strSchema); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2910) Make toString() methods on Schema and FieldSchema be readable by Utils.getSchemaFromString()
[ https://issues.apache.org/jira/browse/PIG-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-2910: --- Issue Type: Improvement (was: Bug) > Make toString() methods on Schema and FieldSchema be readable by > Utils.getSchemaFromString() > > > Key: PIG-2910 > URL: https://issues.apache.org/jira/browse/PIG-2910 > Project: Pig > Issue Type: Improvement > Components: impl, parser >Affects Versions: 0.9.2, 0.10.0, 0.11, 0.10.1 >Reporter: Russell Jurney >Assignee: Eli Reisman > Labels: newbie > Fix For: 0.11, 0.10.1 > > Attachments: PIG-2910-1.patch, PIG-2910-2.patch, PIG-2910-3.patch, > PIG-2910-4.patch > > > I want to toString() schemas and send them to the backend via UDFContext. At > the moment this requires writing your own toString() method that > Utils.getSchemaFromString() can read. Making a readable schema for the > backend would be an improvement. > I spoke with Thejas, who believes this is a bug. The workaround for the > moment is, for example: > String schemaString = inputSchema.toString().substring(1, > inputSchema.toString().length() - 1); > // Set the input schema for processing > UDFContext context = UDFContext.getUDFContext(); > Properties udfProp = context.getUDFProperties(this.getClass()); > udfProp.setProperty("horton.json.udf.schema", schemaString); > ... > schema = Utils.getSchemaFromString(strSchema); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2910) Make toString() methods on Schema and FieldSchema be readable by Utils.getSchemaFromString()
[ https://issues.apache.org/jira/browse/PIG-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-2910: --- Assignee: Eli Reisman (was: Thejas M Nair) > Make toString() methods on Schema and FieldSchema be readable by > Utils.getSchemaFromString() > > > Key: PIG-2910 > URL: https://issues.apache.org/jira/browse/PIG-2910 > Project: Pig > Issue Type: Bug > Components: impl, parser >Affects Versions: 0.9.2, 0.10.0, 0.11, 0.10.1 >Reporter: Russell Jurney >Assignee: Eli Reisman > Labels: newbie > Fix For: 0.11, 0.10.1 > > Attachments: PIG-2910-1.patch, PIG-2910-2.patch, PIG-2910-3.patch, > PIG-2910-4.patch > > > I want to toString() schemas and send them to the backend via UDFContext. At > the moment this requires writing your own toString() method that > Utils.getSchemaFromString() can read. Making a readable schema for the > backend would be an improvement. > I spoke with Thejas, who believes this is a bug. The workaround for the > moment is, for example: > String schemaString = inputSchema.toString().substring(1, > inputSchema.toString().length() - 1); > // Set the input schema for processing > UDFContext context = UDFContext.getUDFContext(); > Properties udfProp = context.getUDFProperties(this.getClass()); > udfProp.setProperty("horton.json.udf.schema", schemaString); > ... > schema = Utils.getSchemaFromString(strSchema); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2910) Make toString() methods on Schema and FieldSchema be readable by Utils.getSchemaFromString()
[ https://issues.apache.org/jira/browse/PIG-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473535#comment-13473535 ] Thejas M Nair commented on PIG-2910: Yes, The changes in 2910-3 patch look good. Can you please add a test case, and also add a comment that this schema string has "{}" around it, and that is why the substring is being done ? > Make toString() methods on Schema and FieldSchema be readable by > Utils.getSchemaFromString() > > > Key: PIG-2910 > URL: https://issues.apache.org/jira/browse/PIG-2910 > Project: Pig > Issue Type: Bug > Components: impl, parser >Affects Versions: 0.9.2, 0.10.0, 0.11, 0.10.1 >Reporter: Russell Jurney >Assignee: Thejas M Nair > Labels: newbie > Fix For: 0.11, 0.10.1 > > Attachments: PIG-2910-1.patch, PIG-2910-2.patch, PIG-2910-3.patch > > > I want to toString() schemas and send them to the backend via UDFContext. At > the moment this requires writing your own toString() method that > Utils.getSchemaFromString() can read. Making a readable schema for the > backend would be an improvement. > I spoke with Thejas, who believes this is a bug. The workaround for the > moment is, for example: > String schemaString = inputSchema.toString().substring(1, > inputSchema.toString().length() - 1); > // Set the input schema for processing > UDFContext context = UDFContext.getUDFContext(); > Properties udfProp = context.getUDFProperties(this.getClass()); > udfProp.setProperty("horton.json.udf.schema", schemaString); > ... > schema = Utils.getSchemaFromString(strSchema); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2910) Make toString() methods on Schema and FieldSchema be readable by Utils.getSchemaFromString()
[ https://issues.apache.org/jira/browse/PIG-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13472941#comment-13472941 ] Thejas M Nair commented on PIG-2910: The extra parenthesis added by Schema.toString() are curly braces. I believe this is because the schema when thought of as a schema of a relation, is a schema of a bag. I don't think the PIG-2910-2.patch will fix the issue. But changing the behavior of either Schema.toString() or Utils.getSchemaFromString() will break backward compatibility. I think we should just add a new function Utils.getSchemaFromBagSchemaString() and comment that people should use this one to get schema back from output of Schema.toString(). The use of input schema of udf, during udf execution is not very common. I don't think we should serialize it for all udfs. > Make toString() methods on Schema and FieldSchema be readable by > Utils.getSchemaFromString() > > > Key: PIG-2910 > URL: https://issues.apache.org/jira/browse/PIG-2910 > Project: Pig > Issue Type: Bug > Components: impl, parser >Affects Versions: 0.9.2, 0.10.0, 0.11, 0.10.1 >Reporter: Russell Jurney >Assignee: Thejas M Nair > Labels: newbie > Fix For: 0.11, 0.10.1 > > Attachments: PIG-2910-1.patch, PIG-2910-2.patch > > > I want to toString() schemas and send them to the backend via UDFContext. At > the moment this requires writing your own toString() method that > Utils.getSchemaFromString() can read. Making a readable schema for the > backend would be an improvement. > I spoke with Thejas, who believes this is a bug. The workaround for the > moment is, for example: > String schemaString = inputSchema.toString().substring(1, > inputSchema.toString().length() - 1); > // Set the input schema for processing > UDFContext context = UDFContext.getUDFContext(); > Properties udfProp = context.getUDFProperties(this.getClass()); > udfProp.setProperty("horton.json.udf.schema", schemaString); > ... > schema = Utils.getSchemaFromString(strSchema); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2769) a simple logic causes very long compiling time on pig 0.10.0
[ https://issues.apache.org/jira/browse/PIG-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13469592#comment-13469592 ] Thejas M Nair commented on PIG-2769: I have assigned it to you. I have also added you to contributors list, now you should be able to assign jiras to yourself. > a simple logic causes very long compiling time on pig 0.10.0 > > > Key: PIG-2769 > URL: https://issues.apache.org/jira/browse/PIG-2769 > Project: Pig > Issue Type: Bug > Components: build >Affects Versions: 0.10.0 > Environment: Apache Pig version 0.10.0-SNAPSHOT (rexported) >Reporter: Dan Li >Assignee: Timothy Chen > Fix For: 0.11 > > Attachments: case1.tar > > > We found the following simple logic will cause very long compiling time for > pig 0.10.0, while using pig 0.8.1, everything is fine. > A = load 'A.txt' using PigStorage() AS (m: int); > B = FOREACH A { > days_str = (chararray) > (m == 1 ? 31: > (m == 2 ? 28: > (m == 3 ? 31: > (m == 4 ? 30: > (m == 5 ? 31: > (m == 6 ? 30: > (m == 7 ? 31: > (m == 8 ? 31: > (m == 9 ? 30: > (m == 10 ? 31: > (m == 11 ? 30:31))); > GENERATE >days_str as days_str; > } > store B into 'B'; > and here's a simple input file example: A.txt > 1 > 2 > 3 > The pig version we used in the test > Apache Pig version 0.10.0-SNAPSHOT (rexported) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2769) a simple logic causes very long compiling time on pig 0.10.0
[ https://issues.apache.org/jira/browse/PIG-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-2769: --- Assignee: Timothy Chen > a simple logic causes very long compiling time on pig 0.10.0 > > > Key: PIG-2769 > URL: https://issues.apache.org/jira/browse/PIG-2769 > Project: Pig > Issue Type: Bug > Components: build >Affects Versions: 0.10.0 > Environment: Apache Pig version 0.10.0-SNAPSHOT (rexported) >Reporter: Dan Li >Assignee: Timothy Chen > Fix For: 0.11 > > Attachments: case1.tar > > > We found the following simple logic will cause very long compiling time for > pig 0.10.0, while using pig 0.8.1, everything is fine. > A = load 'A.txt' using PigStorage() AS (m: int); > B = FOREACH A { > days_str = (chararray) > (m == 1 ? 31: > (m == 2 ? 28: > (m == 3 ? 31: > (m == 4 ? 30: > (m == 5 ? 31: > (m == 6 ? 30: > (m == 7 ? 31: > (m == 8 ? 31: > (m == 9 ? 30: > (m == 10 ? 31: > (m == 11 ? 30:31))); > GENERATE >days_str as days_str; > } > store B into 'B'; > and here's a simple input file example: A.txt > 1 > 2 > 3 > The pig version we used in the test > Apache Pig version 0.10.0-SNAPSHOT (rexported) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2911) GenMRSkewJoinProcessor uses File.Separator instead of Path.Separator
[ https://issues.apache.org/jira/browse/PIG-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-2911: --- Resolution: Invalid Status: Resolved (was: Patch Available) Sorry, created the bug on wrong product! :) > GenMRSkewJoinProcessor uses File.Separator instead of Path.Separator > > > Key: PIG-2911 > URL: https://issues.apache.org/jira/browse/PIG-2911 > Project: Pig > Issue Type: Bug >Reporter: Thejas M Nair >Assignee: Thejas M Nair > Attachments: PIG-2911.1.patch > > > This causes testcase skewjoin.q to fail on windows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2911) GenMRSkewJoinProcessor uses File.Separator instead of Path.Separator
[ https://issues.apache.org/jira/browse/PIG-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-2911: --- Status: Patch Available (was: Open) > GenMRSkewJoinProcessor uses File.Separator instead of Path.Separator > > > Key: PIG-2911 > URL: https://issues.apache.org/jira/browse/PIG-2911 > Project: Pig > Issue Type: Bug >Reporter: Thejas M Nair >Assignee: Thejas M Nair > Attachments: PIG-2911.1.patch > > > This causes testcase skewjoin.q to fail on windows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2911) GenMRSkewJoinProcessor uses File.Separator instead of Path.Separator
[ https://issues.apache.org/jira/browse/PIG-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-2911: --- Attachment: PIG-2911.1.patch > GenMRSkewJoinProcessor uses File.Separator instead of Path.Separator > > > Key: PIG-2911 > URL: https://issues.apache.org/jira/browse/PIG-2911 > Project: Pig > Issue Type: Bug >Reporter: Thejas M Nair >Assignee: Thejas M Nair > Attachments: PIG-2911.1.patch > > > This causes testcase skewjoin.q to fail on windows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-2911) GenMRSkewJoinProcessor uses File.Separator instead of Path.Separator
Thejas M Nair created PIG-2911: -- Summary: GenMRSkewJoinProcessor uses File.Separator instead of Path.Separator Key: PIG-2911 URL: https://issues.apache.org/jira/browse/PIG-2911 Project: Pig Issue Type: Bug Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: PIG-2911.1.patch This causes testcase skewjoin.q to fail on windows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2895) jodatime jar missing in pig-withouthadoop.jar
[ https://issues.apache.org/jira/browse/PIG-2895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-2895: --- Resolution: Fixed Status: Resolved (was: Patch Available) patch committed to trunk > jodatime jar missing in pig-withouthadoop.jar > - > > Key: PIG-2895 > URL: https://issues.apache.org/jira/browse/PIG-2895 > Project: Pig > Issue Type: Sub-task >Reporter: Thejas M Nair >Assignee: Thejas M Nair > Fix For: 0.11 > > Attachments: PIG-2895.1.patch, PIG-2895.2.patch > > > jodatime jar is missing in pig-withouthadoop.jar. When an external hadoop.jar > is used, pig will fail with class not found error. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2895) jodatime jar missing in pig-withouthadoop.jar
[ https://issues.apache.org/jira/browse/PIG-2895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-2895: --- Status: Patch Available (was: Open) Patch tested against a multi-node hadoop cluster. > jodatime jar missing in pig-withouthadoop.jar > - > > Key: PIG-2895 > URL: https://issues.apache.org/jira/browse/PIG-2895 > Project: Pig > Issue Type: Sub-task >Reporter: Thejas M Nair >Assignee: Thejas M Nair > Fix For: 0.11 > > Attachments: PIG-2895.1.patch, PIG-2895.2.patch > > > jodatime jar is missing in pig-withouthadoop.jar. When an external hadoop.jar > is used, pig will fail with class not found error. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2895) jodatime jar missing in pig-withouthadoop.jar
[ https://issues.apache.org/jira/browse/PIG-2895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-2895: --- Attachment: PIG-2895.2.patch PIG-2895.2.patch - adds "org/joda/time" to pigPackagesToSend. Thanks Julien for the directions! > jodatime jar missing in pig-withouthadoop.jar > - > > Key: PIG-2895 > URL: https://issues.apache.org/jira/browse/PIG-2895 > Project: Pig > Issue Type: Sub-task >Reporter: Thejas M Nair >Assignee: Thejas M Nair > Fix For: 0.11 > > Attachments: PIG-2895.1.patch, PIG-2895.2.patch > > > jodatime jar is missing in pig-withouthadoop.jar. When an external hadoop.jar > is used, pig will fail with class not found error. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (PIG-2893) fix DBStorage compile issue
[ https://issues.apache.org/jira/browse/PIG-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair reassigned PIG-2893: -- Assignee: Thejas M Nair > fix DBStorage compile issue > --- > > Key: PIG-2893 > URL: https://issues.apache.org/jira/browse/PIG-2893 > Project: Pig > Issue Type: Sub-task >Reporter: Thejas M Nair >Assignee: Thejas M Nair > Fix For: 0.11 > > Attachments: PIG-2893.1.patch > > > DBStorage does not compile after the datetime patch was committed. The joda > datetime was passed as argument to java.sql.PreparedStatement.setDate() > instead of java.sql.Date . -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443630#comment-13443630 ] Thejas M Nair commented on PIG-1314: Yes, that was not intentional. Deleted JobControlCompiler.java.orig in svn. > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Zhijie Shen > Labels: gsoc2012 > Attachments: joda_vs_builtin.zip, PIG-1314-1.patch, PIG-1314-2.patch, > PIG-1314-3.patch, PIG-1314-4.patch, PIG-1314-5.patch, PIG-1314-6.patch, > PIG-1314-7.patch > > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2895) jodatime jar missing in pig-withouthadoop.jar
[ https://issues.apache.org/jira/browse/PIG-2895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-2895: --- Attachment: PIG-2895.1.patch > jodatime jar missing in pig-withouthadoop.jar > - > > Key: PIG-2895 > URL: https://issues.apache.org/jira/browse/PIG-2895 > Project: Pig > Issue Type: Sub-task >Reporter: Thejas M Nair >Assignee: Thejas M Nair > Fix For: 0.11 > > Attachments: PIG-2895.1.patch > > > jodatime jar is missing in pig-withouthadoop.jar. When an external hadoop.jar > is used, pig will fail with class not found error. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-2895) jodatime jar missing in pig-withouthadoop.jar
Thejas M Nair created PIG-2895: -- Summary: jodatime jar missing in pig-withouthadoop.jar Key: PIG-2895 URL: https://issues.apache.org/jira/browse/PIG-2895 Project: Pig Issue Type: Sub-task Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.11 jodatime jar is missing in pig-withouthadoop.jar. When an external hadoop.jar is used, pig will fail with class not found error. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2893) fix DBStorage compile issue
[ https://issues.apache.org/jira/browse/PIG-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13442922#comment-13442922 ] Thejas M Nair commented on PIG-2893: The compile error was - [javac] symbol : method setDate(int,java.util.Date) [javac] location: interface java.sql.PreparedStatement [javac] ps.setDate(sqlPos, ((DateTime) field).toDate()); > fix DBStorage compile issue > --- > > Key: PIG-2893 > URL: https://issues.apache.org/jira/browse/PIG-2893 > Project: Pig > Issue Type: Sub-task >Reporter: Thejas M Nair > Attachments: PIG-2893.1.patch > > > DBStorage does not compile after the datetime patch was committed. The joda > datetime was passed as argument to java.sql.PreparedStatement.setDate() > instead of java.sql.Date . -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-2893) fix DBStorage compile issue
Thejas M Nair created PIG-2893: -- Summary: fix DBStorage compile issue Key: PIG-2893 URL: https://issues.apache.org/jira/browse/PIG-2893 Project: Pig Issue Type: Sub-task Reporter: Thejas M Nair Attachments: PIG-2893.1.patch DBStorage does not compile after the datetime patch was committed. The joda datetime was passed as argument to java.sql.PreparedStatement.setDate() instead of java.sql.Date . -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2893) fix DBStorage compile issue
[ https://issues.apache.org/jira/browse/PIG-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-2893: --- Attachment: PIG-2893.1.patch PIG-2893.1.patch - fix for compile issue, updates to test case to use datetime type. > fix DBStorage compile issue > --- > > Key: PIG-2893 > URL: https://issues.apache.org/jira/browse/PIG-2893 > Project: Pig > Issue Type: Sub-task >Reporter: Thejas M Nair > Attachments: PIG-2893.1.patch > > > DBStorage does not compile after the datetime patch was committed. The joda > datetime was passed as argument to java.sql.PreparedStatement.setDate() > instead of java.sql.Date . -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440858#comment-13440858 ] Thejas M Nair commented on PIG-1314: We also need to have some test cases that set the timezone property. This might not be easy to do in the e2e framework, so unit test cases are better candidate for this. Please let me know if you need any help. > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Zhijie Shen > Labels: gsoc2012 > Attachments: joda_vs_builtin.zip, PIG-1314-1.patch, PIG-1314-2.patch, > PIG-1314-3.patch, PIG-1314-4.patch, PIG-1314-5.patch, PIG-1314-6.patch, > PIG-1314-7.patch > > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440851#comment-13440851 ] Thejas M Nair commented on PIG-1314: PIG-1314-7.patch committed to trunk! Thanks Zhijie. We need to update the documentation regarding this change. Can you please upload a new patch for that ? To see generated docs, run - ant -Dforrest.home= docs. The files to be edited are under - trunk/src/docs/src/documentation/ . We should also add a few end to end test cases for datetime. See https://cwiki.apache.org/confluence/display/PIG/HowToTest#HowToTest-EndtoendTesting . We should have a few queries that do some of the basic operations on date time, and queries that have order-by , group and join on date fields. These can be submitted as multiple patches. > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Zhijie Shen > Labels: gsoc2012 > Attachments: joda_vs_builtin.zip, PIG-1314-1.patch, PIG-1314-2.patch, > PIG-1314-3.patch, PIG-1314-4.patch, PIG-1314-5.patch, PIG-1314-6.patch, > PIG-1314-7.patch > > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2811) Updating .eclipse.templates/.classpath with the Newest Jython Version
[ https://issues.apache.org/jira/browse/PIG-2811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-2811: --- Resolution: Fixed Status: Resolved (was: Patch Available) Fixed in the PIG-1314 patch. > Updating .eclipse.templates/.classpath with the Newest Jython Version > - > > Key: PIG-2811 > URL: https://issues.apache.org/jira/browse/PIG-2811 > Project: Pig > Issue Type: Bug > Components: tools >Reporter: Zhijie Shen >Assignee: Zhijie Shen >Priority: Trivial > Fix For: 0.11 > > Attachments: PIG-2811.patch > > > Jython library version has been upgraded to 2.5.2 by the PIG-2665 patch, but > the related modification is not made in the Eclipse template file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2662) skew join does not honor its config parameters
[ https://issues.apache.org/jira/browse/PIG-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440789#comment-13440789 ] Thejas M Nair commented on PIG-2662: Koji, What OS, JVM are you using ? > skew join does not honor its config parameters > -- > > Key: PIG-2662 > URL: https://issues.apache.org/jira/browse/PIG-2662 > Project: Pig > Issue Type: Bug >Affects Versions: 0.9.2, 0.10.0 >Reporter: Thejas M Nair >Assignee: Rajesh Balamohan > Fix For: 0.11 > > Attachments: PIG-2662-0.9.2.patch, PIG-2662.2.patch, PIG-2662.3.patch > > > Skew join can be configured using pig.sksampler.samplerate and > pig.skewedjoin.reduce.memusage. But the section of code the retrieves the > config values from properties (PoissonSampleLoader.computeSamples) is not > getting called. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2662) skew join does not honor its config parameters
[ https://issues.apache.org/jira/browse/PIG-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-2662: --- Fix Version/s: 0.11 Assignee: Rajesh Balamohan Status: Patch Available (was: Open) > skew join does not honor its config parameters > -- > > Key: PIG-2662 > URL: https://issues.apache.org/jira/browse/PIG-2662 > Project: Pig > Issue Type: Bug >Affects Versions: 0.10.0, 0.9.2 >Reporter: Thejas M Nair >Assignee: Rajesh Balamohan > Fix For: 0.11 > > Attachments: PIG-2662-0.9.2.patch, PIG-2662.2.patch, PIG-2662.3.patch > > > Skew join can be configured using pig.sksampler.samplerate and > pig.skewedjoin.reduce.memusage. But the section of code the retrieves the > config values from properties (PoissonSampleLoader.computeSamples) is not > getting called. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2662) skew join does not honor its config parameters
[ https://issues.apache.org/jira/browse/PIG-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-2662: --- Resolution: Fixed Status: Resolved (was: Patch Available) +1 Patch committed to trunk. Thanks Rajesh! > skew join does not honor its config parameters > -- > > Key: PIG-2662 > URL: https://issues.apache.org/jira/browse/PIG-2662 > Project: Pig > Issue Type: Bug >Affects Versions: 0.9.2, 0.10.0 >Reporter: Thejas M Nair >Assignee: Rajesh Balamohan > Fix For: 0.11 > > Attachments: PIG-2662-0.9.2.patch, PIG-2662.2.patch, PIG-2662.3.patch > > > Skew join can be configured using pig.sksampler.samplerate and > pig.skewedjoin.reduce.memusage. But the section of code the retrieves the > config values from properties (PoissonSampleLoader.computeSamples) is not > getting called. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434733#comment-13434733 ] Thejas M Nair commented on PIG-1314: bq. 2. According to your last response, I'm not clear how the default timezone of client can be sent to the server with the code. In my opinion, the default timezone should be specified on the server side by configuration, which should be taken care of by administrators. How do you think about this. I believe you should be able to set the default timezone property in PigContext constructor, and also let user override the default. In backend, you can access the value using something like - PigMapReduce.sJobConfInternal.get().get("pig.datetime.default.tz"). > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Zhijie Shen > Labels: gsoc2012 > Attachments: joda_vs_builtin.zip, PIG-1314-1.patch, PIG-1314-2.patch, > PIG-1314-3.patch, PIG-1314-4.patch, PIG-1314-5.patch > > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2662) skew join does not honor its config parameters
[ https://issues.apache.org/jira/browse/PIG-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434294#comment-13434294 ] Thejas M Nair commented on PIG-2662: Rajesh, With the patch, TestPoissonSampleLoader test cases fail. Can you please take a look ? Please let me know if you need any help with that. > skew join does not honor its config parameters > -- > > Key: PIG-2662 > URL: https://issues.apache.org/jira/browse/PIG-2662 > Project: Pig > Issue Type: Bug >Affects Versions: 0.9.2, 0.10.0 >Reporter: Thejas M Nair > Attachments: PIG-2662-0.9.2.patch, PIG-2662.2.patch > > > Skew join can be configured using pig.sksampler.samplerate and > pig.skewedjoin.reduce.memusage. But the section of code the retrieves the > config values from properties (PoissonSampleLoader.computeSamples) is not > getting called. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2662) skew join does not honor its config parameters
[ https://issues.apache.org/jira/browse/PIG-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-2662: --- Attachment: PIG-2662.2.patch PIG-2662.2.patch - This patch fixes compile error in previous one (conf variable is not declared). Running tests with this one. > skew join does not honor its config parameters > -- > > Key: PIG-2662 > URL: https://issues.apache.org/jira/browse/PIG-2662 > Project: Pig > Issue Type: Bug >Affects Versions: 0.9.2, 0.10.0 >Reporter: Thejas M Nair > Attachments: PIG-2662-0.9.2.patch, PIG-2662.2.patch > > > Skew join can be configured using pig.sksampler.samplerate and > pig.skewedjoin.reduce.memusage. But the section of code the retrieves the > config values from properties (PoissonSampleLoader.computeSamples) is not > getting called. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13428458#comment-13428458 ] Thejas M Nair commented on PIG-1314: bq. 1. You've mentioned that we need to propagate the timezone from the client to backend, where the udfs get executed. How the timezone should be propagated to the backend, which I assume the machine that runs the code? Yes bq. Previously I made the timezone setting in pig.properties, which will be loaded when PigServer runs, such that the default timezone will be set. Consequently, if a datetime object is created without specifying the timezone, the default one will be used. However, do you mean some other way? It is possible that some of the task nodes might be misconfigured and have different default time zone. In such cases, the results won't be what you want and it will be very difficult to debug. So the default timezone on the client should be used in the nodes as well. bq. I convert the location-based timezone to the utc-offset one and only use utc-offset style internally. Therefore, the aforementioned two equal datetime objects will not be mis-treated. Sounds good. > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Zhijie Shen > Labels: gsoc2012 > Attachments: PIG-1314-1.patch, PIG-1314-2.patch, PIG-1314-3.patch, > PIG-1314-4.patch, joda_vs_builtin.zip > > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2829) Use partial aggregation more aggresively
[ https://issues.apache.org/jira/browse/PIG-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13424140#comment-13424140 ] Thejas M Nair commented on PIG-2829: Thanks for the benchmark Jie. Clearly, partial-agg is working better than combiner. Can you also run some benchmarks with combiner turned off, so that we can verify the appropriate value for pig.exec.mapPartAgg.minReduction - ||query || combiner off, partial-agg off || combiner off, partial-agg on || |g-by with reduction by 3 | | | |g-by with reduction by 2| | | > Use partial aggregation more aggresively > > > Key: PIG-2829 > URL: https://issues.apache.org/jira/browse/PIG-2829 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.10.0 >Reporter: Jie Li > Attachments: 2829.1.patch, 2829.2.patch, 2829.separate.options.patch, > pigmix-10G.png, tpch-10G.png > > > Partial aggregation (Hash Aggregation, aka in-map combiner) is a new feature > in Pig 0.10 that will perform aggregation within map function. The main > advantage against combiner is it avoids de/serializing and sorting the data, > and it can auto disable itself if the data reduction rate is low. Currently > it's disabled by default. > To leverage the power of PartialAgg more aggressively, several things need to > be revisited: > 1. The threshold of auto-disabling. Currently each mapper looks at first 1k > (hard-coded) records to see if there's enough data size reduction (defaults > to 10x, configurable). The check would happen earlier if the hash table gets > full before processing the 1k records (hash table size is controlled by > pig.cachedbag.memusage). We might want to relax these thresholds. > 2. Dependency on the combiner. Currently the PartialAgg won't work without a > combiner following it, so we need to provide separate options to enable each > independently. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2829) Use partial aggregation more aggresively
[ https://issues.apache.org/jira/browse/PIG-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13423602#comment-13423602 ] Thejas M Nair commented on PIG-2829: I will review the patch soon. Some comments regarding the default configuration - bq. 2: changes existing default values: After thinking of the multi-query use case, where you can have multiple POPartialAgg operators in a map task, I am having second thoughts on turning partial agg on by default. Can you try these settings queries where there are around 10+ group+agg that get combined into single MR job ? Maybe we should address the potential OOM issues for this use case before we change the defaults. This is likely to be become a bigger issue when we use 100k records to decide to turn on/off the partial aggregation. bq. 3: adds a property pig.exec.mapPartAgg.reduction.checkinterval which defaults to 100k, so after processing every 100k records mapagg will check the reduction rate to see if it should be disabled. Previously we only look at first 1000 records. Can you do some benchmarks to see if there is any noticeable difference in runtime because of the delay in turning mapPartAgg off ? > Use partial aggregation more aggresively > > > Key: PIG-2829 > URL: https://issues.apache.org/jira/browse/PIG-2829 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.10.0 >Reporter: Jie Li > Attachments: 2829.1.patch, 2829.separate.options.patch, > pigmix-10G.png, tpch-10G.png > > > Partial aggregation (Hash Aggregation, aka in-map combiner) is a new feature > in Pig 0.10 that will perform aggregation within map function. The main > advantage against combiner is it avoids de/serializing and sorting the data, > and it can auto disable itself if the data reduction rate is low. Currently > it's disabled by default. > To leverage the power of PartialAgg more aggressively, several things need to > be revisited: > 1. The threshold of auto-disabling. Currently each mapper looks at first 1k > (hard-coded) records to see if there's enough data size reduction (defaults > to 10x, configurable). The check would happen earlier if the hash table gets > full before processing the 1k records (hash table size is controlled by > pig.cachedbag.memusage). We might want to relax these thresholds. > 2. Dependency on the combiner. Currently the PartialAgg won't work without a > combiner following it, so we need to provide separate options to enable each > independently. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2826) Training link on front page no longer points to Pig training
[ https://issues.apache.org/jira/browse/PIG-2826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419589#comment-13419589 ] Thejas M Nair commented on PIG-2826: +1 > Training link on front page no longer points to Pig training > > > Key: PIG-2826 > URL: https://issues.apache.org/jira/browse/PIG-2826 > Project: Pig > Issue Type: Bug > Components: site >Affects Versions: site >Reporter: Alan Gates >Assignee: Alan Gates > Fix For: site > > Attachments: PIG-2826.patch > > > The training link on Pig's website used to point to a Pig specific video on > Cloudera's site. It now points to a list of all their videos. Also, at the > time they were the only ones providing training videos for Hadoop. Now other > vendors do as well. This link should be replaced by a link to a wiki page > where vendors who wish to can list their training resources. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13412452#comment-13412452 ] Thejas M Nair commented on PIG-1314: Zhijie, I have added comments on your latest patch in https://reviews.apache.org/r/5414/. Yes, lets focus on test cases now, so that we can get an initial version committed. > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Zhijie Shen > Labels: gsoc2012 > Attachments: PIG-1314-1.patch, PIG-1314-2.patch, PIG-1314-3.patch, > joda_vs_builtin.zip > > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13405971#comment-13405971 ] Thejas M Nair commented on PIG-1314: PigStorage is meant to be a human readable format. So that is another reason to store the timestamp in the ISO string as you suggested. Yes, If the timezone is specified in the string, pig should use that value. But the timezone part and time part of the datetime string should be optional. Does jodatime support that ? > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Zhijie Shen > Labels: gsoc2012 > Attachments: PIG-1314-1.patch, PIG-1314-2.patch, joda_vs_builtin.zip > > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2774) Fix merge join to work with many duplicate left keys
[ https://issues.apache.org/jira/browse/PIG-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403480#comment-13403480 ] Thejas M Nair commented on PIG-2774: bq. we might have other operations queued up after the join In 2nd approach, the operations within map task don't complicate things. But to handle a reduce after the merge-join, we would need to introduce another map task that does a union of merge-join results. For example, if the merge-join is followed by a group+agg , then the follow transformation to plan would be needed. Map(Merge-join + group+agg ops) + Reduce(group+agg ops) => Map (merge-join wave 1 + group+agg ops) + Map (merge-join wave 2 + group+agg opps) + Map(union of 1st 2 maps) + Reduce(group+agg ops) This transformation can't happen dynamically - we can't decide to skip the reduce while in the map phase. To handle this case dynamically, looks like the first approach is one that actually would work! The user or a metadata system possibly identify the skew problem and recommend using a 'skew-merge' join next time query is run on similar data. > Fix merge join to work with many duplicate left keys > > > Key: PIG-2774 > URL: https://issues.apache.org/jira/browse/PIG-2774 > Project: Pig > Issue Type: Bug >Reporter: Aneesh Sharma > > A merge join can throw an OOM error if the number of duplicate left tuples is > large as it accumulates all of them in memory. There are two solutions around > this problem: > 1. Serialize the accumulated tuples to disk if they exceed a certain size. > 2. Spit out join output periodically, and re-seek on the right hand side > index. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2774) Fix merge join to work with many duplicate left keys
[ https://issues.apache.org/jira/browse/PIG-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403461#comment-13403461 ] Thejas M Nair commented on PIG-2774: bq. I'd like to avoid having the user encode these details in the pig script. Floating some more ideas - A more performant way of doing this would be to stop accumulating tuples for a join key value from left relation into memory when a certain memory threshold is exceeded. Once join of these tuples against the right relation is done, discard the accumulated left rel tuples for the join key and and load a new set, go back to the start of relations with this join key in right relation and continue. To go back more efficiently to the start of join key in right relation we can keep track of its record offset. This approach will have no additional writes and have less IO overall. The right relation block hopefully gets in to OS cache. But this approach can result in some map tasks being much slower than others. Another option is to write the left side join key values that didn't fit into memory onto hdfs in separate files, one file for each chunch that is expected to fit into memory, and have another round of MR job do merge join on these files. ( I think hive has a skew join impl on similar lines). This would involve changing the MR plan at runtime. > Fix merge join to work with many duplicate left keys > > > Key: PIG-2774 > URL: https://issues.apache.org/jira/browse/PIG-2774 > Project: Pig > Issue Type: Bug >Reporter: Aneesh Sharma > > A merge join can throw an OOM error if the number of duplicate left tuples is > large as it accumulates all of them in memory. There are two solutions around > this problem: > 1. Serialize the accumulated tuples to disk if they exceed a certain size. > 2. Spit out join output periodically, and re-seek on the right hand side > index. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2774) Fix merge join to work with many duplicate left keys
[ https://issues.apache.org/jira/browse/PIG-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402722#comment-13402722 ] Thejas M Nair commented on PIG-2774: If the left side relations tuples for a value of join key are serialized to disk, then for ever value of join key in right relation, it will hit the disk. That will perform very poorly. Looks like what we need is something like a merge-skew join. Ie, similar to skew join, sample left side, and partition the splits for map tasks based on sampled information. > Fix merge join to work with many duplicate left keys > > > Key: PIG-2774 > URL: https://issues.apache.org/jira/browse/PIG-2774 > Project: Pig > Issue Type: Bug >Reporter: Aneesh Sharma > > A merge join can throw an OOM error if the number of duplicate left tuples is > large as it accumulates all of them in memory. There are two solutions around > this problem: > 1. Serialize the accumulated tuples to disk if they exceed a certain size. > 2. Spit out join output periodically, and re-seek on the right hand side > index. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira