[jira] [Commented] (PIG-5112) Cleanup pig-template.xml

2017-01-25 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15838166#comment-15838166
 ] 

Thejas M Nair commented on PIG-5112:


+1

> Cleanup pig-template.xml
> 
>
> Key: PIG-5112
> URL: https://issues.apache.org/jira/browse/PIG-5112
> Project: Pig
>  Issue Type: Bug
>  Components: build
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.17.0
>
> Attachments: PIG-5112-1.patch
>
>
> Several entries in pig-template.xml are outdated. Attach a patch to remove or 
> update those entries. Later we shall use ivy:makepom to generate pig.pom and 
> lib dir, I will open a separate ticket for that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4972) StreamingIO_1 fail on perl 5.22

2016-08-30 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15450030#comment-15450030
 ] 

Thejas M Nair commented on PIG-4972:


+1

> StreamingIO_1 fail on perl 5.22
> ---
>
> Key: PIG-4972
> URL: https://issues.apache.org/jira/browse/PIG-4972
> Project: Pig
>  Issue Type: Bug
>  Components: e2e harness
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.17.0
>
> Attachments: PIG-4972-1.patch
>
>
> Saw StreamingIO_1 on particular perl version due to a warning in 
> PigStreaming.pl. You can see the warning in any version of perl using "perl 
> -w":
> {code}
> defined(%hash) is deprecated at streaming/PigStreaming.pl line 76.
>   (Maybe you should just omit the defined()?)
> {code}
> In some particular version of perl, warning check is mandatory and the perl 
> script just fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-1472) Optimize serialization/deserialization between Map and Reduce and between MR jobs

2015-08-12 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14694026#comment-14694026
 ] 

Thejas M Nair commented on PIG-1472:


I don't remember if I had looked into WritableUtils.writeVInt back then or if 
it was available with the pig version being used back then (its been 5 years! 
:) )
Would using WritableUtils.writeVInt mean that an extra byte needs to be used 
for storing the type ? ie bag vs map vs tuple ..
For complex types, savings are more noticeable for smaller sizes. For a bag of 
size 32768, one byte saving won't be significant. However, for an int of size 
32768 , the saving of one byte is significant.


> Optimize serialization/deserialization between Map and Reduce and between MR 
> jobs
> -
>
> Key: PIG-1472
> URL: https://issues.apache.org/jira/browse/PIG-1472
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.8.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.8.0
>
> Attachments: PIG-1472.2.patch, PIG-1472.3.patch, PIG-1472.4.patch, 
> PIG-1472.patch
>
>
> In certain types of pig queries most of the execution time is spent in 
> serializing/deserializing (sedes) records between Map and Reduce and between 
> MR jobs. 
> For example, if PigMix queries are modified to specify types for all the 
> fields in the load statement schema, some of the queries (L2,L3,L9, L10 in 
> pigmix v1) that have records with bags and maps being transmitted across map 
> or reduce boundaries run a lot longer (runtime increase of few times has been 
> seen.
> There are a few optimizations that have shown to improve the performance of 
> sedes in my tests -
> 1. Use smaller number of bytes to store length of the column . For example if 
> a bytearray is smaller than 255 bytes , a byte can be used to store the 
> length instead of the integer that is currently used.
> 2. Instead of custom code to do sedes on Strings, use DataOutput.writeUTF and 
> DataInput.readUTF.  This reduces the cost of serialization by more than 1/2. 
> Zebra and BinStorage are known to use DefaultTuple sedes functionality. The 
> serialization format that these loaders use cannot change, so after the 
> optimization their format is going to be different from the format used 
> between M/R boundaries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4624) Error on ORC empty file without schema

2015-07-06 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-4624:
---
Attachment: PIG-4624.1.patch

> Error on ORC empty file without schema
> --
>
> Key: PIG-4624
> URL: https://issues.apache.org/jira/browse/PIG-4624
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.15.0
>Reporter: Thejas M Nair
>Assignee: Daniel Dai
> Fix For: 0.16.0, 0.15.1
>
> Attachments: PIG-4624.1.patch
>
>
> If ORC produces an empty file without schema (which ideally, it is not 
> supposed to), then pig query reading the data gives the following error - 
> "org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable schema" 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4624) Error on ORC empty file without schema

2015-07-06 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-4624:
---
Attachment: (was: PIG-4624.1.patch)

> Error on ORC empty file without schema
> --
>
> Key: PIG-4624
> URL: https://issues.apache.org/jira/browse/PIG-4624
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.15.0
>Reporter: Thejas M Nair
>Assignee: Daniel Dai
> Fix For: 0.16.0, 0.15.1
>
> Attachments: PIG-4624.1.patch
>
>
> If ORC produces an empty file without schema (which ideally, it is not 
> supposed to), then pig query reading the data gives the following error - 
> "org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable schema" 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4624) Error on ORC empty file without schema

2015-07-06 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-4624:
---
Attachment: (was: PIG-4624.1.patch)

> Error on ORC empty file without schema
> --
>
> Key: PIG-4624
> URL: https://issues.apache.org/jira/browse/PIG-4624
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.15.0
>Reporter: Thejas M Nair
>Assignee: Daniel Dai
> Fix For: 0.16.0, 0.15.1
>
> Attachments: PIG-4624.1.patch
>
>
> If ORC produces an empty file without schema (which ideally, it is not 
> supposed to), then pig query reading the data gives the following error - 
> "org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable schema" 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4624) Error on ORC empty file without schema

2015-07-06 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-4624:
---
Attachment: PIG-4624.1.patch

> Error on ORC empty file without schema
> --
>
> Key: PIG-4624
> URL: https://issues.apache.org/jira/browse/PIG-4624
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.15.0
>Reporter: Thejas M Nair
>Assignee: Daniel Dai
> Fix For: 0.16.0, 0.15.1
>
> Attachments: PIG-4624.1.patch
>
>
> If ORC produces an empty file without schema (which ideally, it is not 
> supposed to), then pig query reading the data gives the following error - 
> "org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable schema" 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4624) Error on ORC empty file without schema

2015-07-06 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-4624:
---
Summary: Error on ORC empty file without schema  (was: pig errors out on 
ORC empty file without schema)

> Error on ORC empty file without schema
> --
>
> Key: PIG-4624
> URL: https://issues.apache.org/jira/browse/PIG-4624
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.15.0
>Reporter: Thejas M Nair
>Assignee: Daniel Dai
> Fix For: 0.16.0, 0.15.1
>
> Attachments: PIG-4624.1.patch
>
>
> If ORC produces an empty file without schema (which ideally, it is not 
> supposed to), then pig query reading the data gives the following error - 
> "org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable schema" 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4624) pig errors out on ORC empty file without schema

2015-07-06 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-4624:
---
Attachment: PIG-4624.1.patch

The ORC issue should be separately addressed in ORC/Hive, however, it would be 
good if pig can handle this case with already generated files.

Attaching patch from [~daijy].


> pig errors out on ORC empty file without schema
> ---
>
> Key: PIG-4624
> URL: https://issues.apache.org/jira/browse/PIG-4624
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.15.0
>Reporter: Thejas M Nair
>Assignee: Daniel Dai
> Fix For: 0.16.0, 0.15.1
>
> Attachments: PIG-4624.1.patch
>
>
> If ORC produces an empty file without schema (which ideally, it is not 
> supposed to), then pig query reading the data gives the following error - 
> "org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable schema" 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4624) pig errors out on ORC empty file without schema

2015-07-06 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-4624:
---
Fix Version/s: 0.15.1
   0.16.0

> pig errors out on ORC empty file without schema
> ---
>
> Key: PIG-4624
> URL: https://issues.apache.org/jira/browse/PIG-4624
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.15.0
>Reporter: Thejas M Nair
>Assignee: Daniel Dai
> Fix For: 0.16.0, 0.15.1
>
>
> If ORC produces an empty file without schema (which ideally, it is not 
> supposed to), then pig query reading the data gives the following error - 
> "org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable schema" 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PIG-4624) pig errors out on ORC empty file without schema

2015-07-06 Thread Thejas M Nair (JIRA)
Thejas M Nair created PIG-4624:
--

 Summary: pig errors out on ORC empty file without schema
 Key: PIG-4624
 URL: https://issues.apache.org/jira/browse/PIG-4624
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.15.0
Reporter: Thejas M Nair
Assignee: Daniel Dai


If ORC produces an empty file without schema (which ideally, it is not supposed 
to), then pig query reading the data gives the following error - 
"org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable schema" 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4556) Local mode is broken in some case by PIG-4247

2015-05-21 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14555165#comment-14555165
 ] 

Thejas M Nair commented on PIG-4556:


I believe s3 is used in non-local modes as well.
Change looks good to me. (for my reference, since it took me few mins to figure 
out, the real change is in order in which params are passed to  
ConfigurationUtil.mergeConf).


> Local mode is broken in some case by PIG-4247
> -
>
> Key: PIG-4556
> URL: https://issues.apache.org/jira/browse/PIG-4556
> Project: Pig
>  Issue Type: Bug
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.15.0
>
> Attachments: PIG-4556-1.patch, PIG-4556-2.patch
>
>
> HExecutionEngine.getS3Conf is wrong. It should only return s3 config. 
> Currently it will return all the properties, including *-site.xml even in 
> local mode. In one particular case, mapred-site.xml contains 
> "mapreduce.application.framework.path", this will going to the local mode 
> config, thus we see the exception:
> {code}
> Message: java.io.FileNotFoundException: File 
> file:/hdp/apps/2.2.0.0-2041/mapreduce/mapreduce.tar.gz does not exist
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:111)
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.resolvePath(AbstractFileSystem.java:460)
>   at org.apache.hadoop.fs.FilterFs.resolvePath(FilterFs.java:157)
>   at org.apache.hadoop.fs.FileContext$24.next(FileContext.java:2137)
>   at org.apache.hadoop.fs.FileContext$24.next(FileContext.java:2133)
>   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
>   at org.apache.hadoop.fs.FileContext.resolve(FileContext.java:2133)
>   at org.apache.hadoop.fs.FileContext.resolvePath(FileContext.java:595)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.addMRFrameworkToDistributedCache(JobSubmitter.java:753)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:435)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
>   at 
> org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
>   at 
> org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:194)
>   at java.lang.Thread.run(Thread.java:745)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:276)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4514) pig trunk compilation is broken - VertexManagerPluginContext.reconfigureVertex change

2015-04-22 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-4514:
---
Attachment: PIG-4514.1.patch

> pig trunk compilation is broken - 
> VertexManagerPluginContext.reconfigureVertex change
> -
>
> Key: PIG-4514
> URL: https://issues.apache.org/jira/browse/PIG-4514
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.15.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.15.0
>
> Attachments: PIG-4514.1.patch
>
>
> {code}
> src/org/apache/pig/backend/hadoop/executionengine/tez/runtime/PigGraceShuffleVertexManager.java:173:
>  error: exception TezException is never thrown in body of corresponding try 
> statement
> [javac] } catch (TezException e) {
> [javac]   ^
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PIG-4514) pig trunk compilation is broken - VertexManagerPluginContext.reconfigureVertex change

2015-04-22 Thread Thejas M Nair (JIRA)
Thejas M Nair created PIG-4514:
--

 Summary: pig trunk compilation is broken - 
VertexManagerPluginContext.reconfigureVertex change
 Key: PIG-4514
 URL: https://issues.apache.org/jira/browse/PIG-4514
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.15.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.15.0



{code}
src/org/apache/pig/backend/hadoop/executionengine/tez/runtime/PigGraceShuffleVertexManager.java:173:
 error: exception TezException is never thrown in body of corresponding try 
statement
[javac] } catch (TezException e) {
[javac]   ^
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4509) [Pig on Tez] Unassigned applications not killed on shutdown

2015-04-16 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498812#comment-14498812
 ] 

Thejas M Nair commented on PIG-4509:


+1
The change looks good to me.
Thanks Rohini!



> [Pig on Tez] Unassigned applications not killed on shutdown
> ---
>
> Key: PIG-4509
> URL: https://issues.apache.org/jira/browse/PIG-4509
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.15.0
>
> Attachments: PIG-4509-1.patch, PIG-4509-FixCompileError.patch
>
>
>  tezclient.stop() should be called when tezClient.waitTillReady() is 
> interrupted on shutdown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4509) [Pig on Tez] Unassigned applications not killed on shutdown

2015-04-16 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498787#comment-14498787
 ] 

Thejas M Nair commented on PIG-4509:


It builds fine on my mac as well with jdk 7. However, it is failing with jdk7 
in our internal build environment as well (probably linux).

The fact that it passes in some setups is certainly very strange. I think we 
should still go ahead and fix this, as far as i know this should result in a 
syntax error.


> [Pig on Tez] Unassigned applications not killed on shutdown
> ---
>
> Key: PIG-4509
> URL: https://issues.apache.org/jira/browse/PIG-4509
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.15.0
>
> Attachments: PIG-4509-1.patch
>
>
>  tezclient.stop() should be called when tezClient.waitTillReady() is 
> interrupted on shutdown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4509) [Pig on Tez] Unassigned applications not killed on shutdown

2015-04-16 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498325#comment-14498325
 ] 

Thejas M Nair commented on PIG-4509:


[~rohini] This results in a compilation failure. 

{code}
src/org/apache/pig/backend/hadoop/executionengine/tez/TezSessionManager.java:105:
 error: unreported exception Throwable; must be caught or declared to be thrown
[javac] throw e;
[javac] ^
{code}

> [Pig on Tez] Unassigned applications not killed on shutdown
> ---
>
> Key: PIG-4509
> URL: https://issues.apache.org/jira/browse/PIG-4509
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.15.0
>
> Attachments: PIG-4509-1.patch
>
>
>  tezclient.stop() should be called when tezClient.waitTillReady() is 
> interrupted on shutdown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PIG-4486) set Tez ACLs appropriately in hive

2015-03-30 Thread Thejas M Nair (JIRA)
Thejas M Nair created PIG-4486:
--

 Summary: set Tez ACLs appropriately in hive
 Key: PIG-4486
 URL: https://issues.apache.org/jira/browse/PIG-4486
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair


Hive should make the necessary changes to integrate with Tez and Timeline. It 
should pass the necessary ACL related params to ensure that query execution + 
logs is only visible to the relevant users.

Proposed Changes -
Set session level tez ACL for a super user, to allow modify + view
Set DAG level ACL for user running the query (the end user), to allow modify + 
view
Determining the super user -
Super user can be configured using using hive.tez.admin.user. This can be 
initialized by Authorization implementation (such as sql standard 
authorization) if it is not already set to a specific value. SQL standard 
authorization would initialize if it is unset to the sql standard admin user.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4331) update README, '-x' option in usage to include tez

2014-11-14 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-4331:
---
Status: Patch Available  (was: Open)

> update README, '-x' option in usage to include tez
> --
>
> Key: PIG-4331
> URL: https://issues.apache.org/jira/browse/PIG-4331
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.15.0
>
> Attachments: PIG-4331.1.patch
>
>
> Pig queries can be run using tez, by specifying "pig -x tez". The output of 
> pig --help needs to be updated to indicate that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4331) update README, '-x' option in usage to include tez

2014-11-14 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-4331:
---
Attachment: PIG-4331.1.patch

> update README, '-x' option in usage to include tez
> --
>
> Key: PIG-4331
> URL: https://issues.apache.org/jira/browse/PIG-4331
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.15.0
>
> Attachments: PIG-4331.1.patch
>
>
> Pig queries can be run using tez, by specifying "pig -x tez". The output of 
> pig --help needs to be updated to indicate that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PIG-4331) update README, '-x' option in usage to include tez

2014-11-14 Thread Thejas M Nair (JIRA)
Thejas M Nair created PIG-4331:
--

 Summary: update README, '-x' option in usage to include tez
 Key: PIG-4331
 URL: https://issues.apache.org/jira/browse/PIG-4331
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.15.0


Pig queries can be run using tez, by specifying "pig -x tez". But usage does 
not indicate this.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4331) update README, '-x' option in usage to include tez

2014-11-14 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-4331:
---
Description: Pig queries can be run using tez, by specifying "pig -x tez". 
The output of pig --help needs to be updated to indicate that.  (was: Pig 
queries can be run using tez, by specifying "pig -x tez". But usage does not 
indicate this.
)

> update README, '-x' option in usage to include tez
> --
>
> Key: PIG-4331
> URL: https://issues.apache.org/jira/browse/PIG-4331
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.15.0
>
>
> Pig queries can be run using tez, by specifying "pig -x tez". The output of 
> pig --help needs to be updated to indicate that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4328) Upgrade Hive to 0.14

2014-11-12 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14209090#comment-14209090
 ] 

Thejas M Nair commented on PIG-4328:


+1

> Upgrade Hive to 0.14
> 
>
> Key: PIG-4328
> URL: https://issues.apache.org/jira/browse/PIG-4328
> Project: Pig
>  Issue Type: Bug
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.14.0
>
> Attachments: PIG-4328-1.patch
>
>
> Hive 0.14.0 artifacts are available. We shall switch to use the released 
> version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4250) Fix Security Risks found by Coverity

2014-10-30 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14191114#comment-14191114
 ] 

Thejas M Nair commented on PIG-4250:


blocks such as following aren't necessary - 
{code}
   } catch (IOException e) {
throw e
} 
{code}

You can use Hadoop IOUtils.closeStream or cleanup to call close. You don't need 
the addtional try-catch and null check with that.


> Fix Security Risks found by Coverity
> 
>
> Key: PIG-4250
> URL: https://issues.apache.org/jira/browse/PIG-4250
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.14.0
>
> Attachments: PIG-4250-1.patch, PIG-4250-2.patch
>
>
> Here is the report: https://scan.coverity.com/projects/3026 (Need to register 
> to see). Most belong to one pattern: not close stream when exception happens.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4160) Provide a way to pass local jars in pig.additional.jars when using a remote url for a script

2014-10-30 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14190848#comment-14190848
 ] 

Thejas M Nair commented on PIG-4160:


+1

> Provide a way to pass local jars in pig.additional.jars when using a remote 
> url for a script
> 
>
> Key: PIG-4160
> URL: https://issues.apache.org/jira/browse/PIG-4160
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.14.0
>Reporter: Andrew C. Oliver
>  Labels: patch
> Fix For: 0.14.0
>
> Attachments: PIG-4160-2.patch, PIG-4160-3.patch, PIG-4160-4.patch, 
> forcelocal.trunk.patch, forcelocal.withtests.patch
>
>   Original Estimate: 3h
>  Remaining Estimate: 3h
>
> patch adds a -j/forcelocaljars flag which if enabled allows you to do 
> pig -j -useHCatalog hdfs://myserver:8020/load/scripts/mydir/myscript.pig
> thus loading the pig script REMOTELY 
> while loading the jar files LOCALLY
> One does this to avoid a single point of failure but avoid one central 
> interversion dependent repository for all the jars across all teams/projects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4160) -forcelocaljars / -j flag when using a remote url for a script

2014-10-30 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14190819#comment-14190819
 ] 

Thejas M Nair commented on PIG-4160:


I think it would be useful to retain mention of the old additional.jars 
parameter in the web docs, Saying that it is similar to .jars.comma but 
separated by the OS path charactor and does not allow for scheme to be 
specified (and that the use of jars.comma is preferred)


> -forcelocaljars / -j flag when using a remote url for a script
> --
>
> Key: PIG-4160
> URL: https://issues.apache.org/jira/browse/PIG-4160
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.14.0
>Reporter: Andrew C. Oliver
>  Labels: patch
> Fix For: 0.14.0
>
> Attachments: PIG-4160-2.patch, PIG-4160-3.patch, 
> forcelocal.trunk.patch, forcelocal.withtests.patch
>
>   Original Estimate: 3h
>  Remaining Estimate: 3h
>
> patch adds a -j/forcelocaljars flag which if enabled allows you to do 
> pig -j -useHCatalog hdfs://myserver:8020/load/scripts/mydir/myscript.pig
> thus loading the pig script REMOTELY 
> while loading the jar files LOCALLY
> One does this to avoid a single point of failure but avoid one central 
> interversion dependent repository for all the jars across all teams/projects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4160) -forcelocaljars / -j flag when using a remote url for a script

2014-10-30 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14190821#comment-14190821
 ] 

Thejas M Nair commented on PIG-4160:


I think we should also change the jira title and description to indicate what 
is being implemented.


> -forcelocaljars / -j flag when using a remote url for a script
> --
>
> Key: PIG-4160
> URL: https://issues.apache.org/jira/browse/PIG-4160
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.14.0
>Reporter: Andrew C. Oliver
>  Labels: patch
> Fix For: 0.14.0
>
> Attachments: PIG-4160-2.patch, PIG-4160-3.patch, 
> forcelocal.trunk.patch, forcelocal.withtests.patch
>
>   Original Estimate: 3h
>  Remaining Estimate: 3h
>
> patch adds a -j/forcelocaljars flag which if enabled allows you to do 
> pig -j -useHCatalog hdfs://myserver:8020/load/scripts/mydir/myscript.pig
> thus loading the pig script REMOTELY 
> while loading the jar files LOCALLY
> One does this to avoid a single point of failure but avoid one central 
> interversion dependent repository for all the jars across all teams/projects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4160) -forcelocaljars / -j flag when using a remote url for a script

2014-10-30 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14190713#comment-14190713
 ] 

Thejas M Nair commented on PIG-4160:


Daniel, can you also include the doc edits ? 
Also the commented line "#HADOOP_OPTS" can be deleted.


> -forcelocaljars / -j flag when using a remote url for a script
> --
>
> Key: PIG-4160
> URL: https://issues.apache.org/jira/browse/PIG-4160
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.14.0
>Reporter: Andrew C. Oliver
>  Labels: patch
> Fix For: 0.14.0
>
> Attachments: PIG-4160-2.patch, forcelocal.trunk.patch, 
> forcelocal.withtests.patch
>
>   Original Estimate: 3h
>  Remaining Estimate: 3h
>
> patch adds a -j/forcelocaljars flag which if enabled allows you to do 
> pig -j -useHCatalog hdfs://myserver:8020/load/scripts/mydir/myscript.pig
> thus loading the pig script REMOTELY 
> while loading the jar files LOCALLY
> One does this to avoid a single point of failure but avoid one central 
> interversion dependent repository for all the jars across all teams/projects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4151) Pig Cannot Write Empty Maps to HBase

2014-10-14 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171925#comment-14171925
 ] 

Thejas M Nair commented on PIG-4151:


+1

> Pig Cannot Write Empty Maps to HBase
> 
>
> Key: PIG-4151
> URL: https://issues.apache.org/jira/browse/PIG-4151
> Project: Pig
>  Issue Type: Bug
>  Components: internal-udfs
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.14.0
>
> Attachments: PIG-4151-1.patch
>
>
> Pig is unable to write empty maps to HBase. Instruction for reproduce:
> input file pig_data_bad.txt:
> {code}
> row1;Homer;Morrison;[1#Silvia,2#Stacy]
> row2;Sheila;Fletcher;[1#Becky,2#Salvador,3#Lois]
> row4;Andre;Morton;[1#Nancy]
> row3;Sonja;Webb;[]
> {code}
> Create table in hbase:
> create 'test', 'info', 'friends'
> Pig script:
> {code}
> source = LOAD '/pig_data_bad.txt' USING PigStorage(';') AS (row:chararray, 
> first_name:chararray, last_name:chararray, friends:map[]);
> STORE source INTO 'hbase://test' USING 
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('info:fname info:lname 
> friends:*');
> {code}
> Stack:
> java.lang.NullPointerException
> at 
> org.apache.pig.backend.hadoop.hbase.HBaseStorage.putNext(HBaseStorage.java:880)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
> at 
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:635)
> at 
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
> at 
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:284)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4141) Ship UDF/LoadFunc/StoreFunc dependent jar automatically

2014-09-16 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14136229#comment-14136229
 ] 

Thejas M Nair commented on PIG-4141:


+1

> Ship UDF/LoadFunc/StoreFunc dependent jar automatically
> ---
>
> Key: PIG-4141
> URL: https://issues.apache.org/jira/browse/PIG-4141
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.14.0
>
> Attachments: PIG-4141-1.patch, PIG-4141-2.patch, PIG-4141-3.patch, 
> PIG-4141-4.patch, PIG-4141-5.patch
>
>
> When user use AvroStorage/JsonStorage/OrcStorage, they need to register 
> dependent jars manually. It would be much convenient if we can provide a 
> mechanism for UDF/LoadFunc/StoreFunc to claim the dependency and ship jars 
> automatically.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4128) New logical optimizer rule: ConstantCalculator

2014-08-26 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14111505#comment-14111505
 ] 

Thejas M Nair commented on PIG-4128:


+1

> New logical optimizer rule: ConstantCalculator
> --
>
> Key: PIG-4128
> URL: https://issues.apache.org/jira/browse/PIG-4128
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.14.0
>
> Attachments: PIG-4128-1.patch, PIG-4128-2.patch, PIG-4128-3.patch
>
>
> Pig used to have a LogicExpressionSimplifier to simplify expression which 
> also calculates constant expression. The optimizer rule is buggy and we 
> disable it by default in PIG-2316.
> However, we do need this feature especially in partition/predicate push down, 
> since both does not deal with complex constant expression, we'd like to 
> replace the expression with constant before the actual push down. Yes, user 
> may manually do the calculation and rewrite the query, but even rewrite is 
> sometimes not possible. Consider the case user want to push a datetime 
> predicate, user have to write a ToDate udf since Pig does not have datetime 
> constant.
> In this Jira, I provide a new rule: ConstantCalculator, which is much simpler 
> and much less error prone, to replace LogicExpressionSimplifier.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3522) Remove shock from pig

2013-11-01 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13811774#comment-13811774
 ] 

Thejas M Nair commented on PIG-3522:


+1

> Remove shock from pig
> -
>
> Key: PIG-3522
> URL: https://issues.apache.org/jira/browse/PIG-3522
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.13.0
>
> Attachments: PIG-3522-1.patch
>
>
> It is only used in very ancient Hadoop which uses HOD as resource manager. 
> Current Pig code does not use it. This include the entire lib-src/shock 
> directory and jsch.jar



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (PIG-3503) More document for Pig 0.12 new features

2013-10-05 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13787319#comment-13787319
 ] 

Thejas M Nair commented on PIG-3503:


Everything else looks good. You can commit after the changes.


> More document for Pig 0.12 new features
> ---
>
> Key: PIG-3503
> URL: https://issues.apache.org/jira/browse/PIG-3503
> Project: Pig
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.12.0
>
> Attachments: PIG-3503-1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (PIG-3503) More document for Pig 0.12 new features

2013-10-05 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13787318#comment-13787318
 ] 

Thejas M Nair commented on PIG-3503:


"If use set command without providing key/value pair, Pig print all the 
configurations and all system properties. " can be changed to 
"If set command is used without key/value pair argument, Pig prints all the 
configurations and system properties."

In perf.xml
In the example, should we use a load function that supports partition filter 
pushdown ? Otherwise, people might expect it to work with PigStorage.
Also, should the example in it without the filter statement be removed ? 
{code}
+
+A = LOAD 'input' as (dt, state, event);
+
{code}


> More document for Pig 0.12 new features
> ---
>
> Key: PIG-3503
> URL: https://issues.apache.org/jira/browse/PIG-3503
> Project: Pig
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.12.0
>
> Attachments: PIG-3503-1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (PIG-3360) Some intermittent negative e2e tests fail on hadoop 2

2013-09-24 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776935#comment-13776935
 ] 

Thejas M Nair commented on PIG-3360:


Looks good. +1


> Some intermittent negative e2e tests fail on hadoop 2
> -
>
> Key: PIG-3360
> URL: https://issues.apache.org/jira/browse/PIG-3360
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.12.0
>
> Attachments: PIG-3360-1.patch, PIG-3360-2.patch
>
>
> One example is StreamingErrors_2. Here is the stack we get:
> Backend error message
> -
> Error: org.apache.pig.backend.executionengine.ExecException: ERROR 2055: 
> Received Error while processing the map plan: 'perl PigStreamingBad.pl middle 
> ' failed with exit status: 2
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:311)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1477)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
> Pig Stack Trace
> ---
> ERROR 2244: Job failed, hadoop does not return any error message
> org.apache.pig.backend.executionengine.ExecException: ERROR 2244: Job failed, 
> hadoop does not return any error message
>   at 
> org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:145)
>   at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
>   at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
>   at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
>   at org.apache.pig.Main.run(Main.java:604)
>   at org.apache.pig.Main.main(Main.java:157)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3259) Optimize byte to Long/Integer conversions

2013-03-26 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13614137#comment-13614137
 ] 

Thejas M Nair commented on PIG-3259:


bq.  How do we determine the number of non-numbers without making calls to 
sanityCheck..()?
By counting the number of times exception has so far been thrown by .valueOf(). 
Once a threshold has been crossed, we can introduce the sanity check for each 
new value. This will put a limit on worst ('incorrect') case performance 
without degrading the 'correct' case performance by much. 

I wonder if there are good libraries that we can use for the sanity checks, as 
the decimal check seems bit more complicated . 

> Optimize byte to Long/Integer conversions
> -
>
> Key: PIG-3259
> URL: https://issues.apache.org/jira/browse/PIG-3259
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11, 0.11.1
>Reporter: Prashant Kommireddi
>Assignee: Prashant Kommireddi
> Fix For: 0.12
>
> Attachments: byteToLong.xlsx
>
>
> These conversions can be performing better. If the input is not numeric 
> (1234abcd) the code calls Double.valueOf(String) regardless before finally 
> returning null. Any script that inadvertently (user's mistake or not) tries 
> to cast non-numeric column to int or long would result in many wasteful 
> calls. 
> We can avoid this and only handle the cases we find the input to be a decimal 
> number (1234.56) and return null otherwise even before trying 
> Double.valueOf(String).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3259) Optimize byte to Long/Integer conversions

2013-03-25 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13613354#comment-13613354
 ] 

Thejas M Nair commented on PIG-3259:


Sounds like a good idea. 
The check you have here does not accept all valid double string representations 
(See 
http://docs.oracle.com/javase/6/docs/api/java/lang/Double.html#valueOf(java.lang.String)
 ) . (eg with exponent, or hexadecimal representation starting with 0x).

But if we can avoid the performance degradation for the 'correct' [1] case 
(which seems to be be in range of 2-8% in the micro benchmark that ran for at 
least few seconds), that would be better. One way to avoid performance 
degradation for 'correct' case would be to start by doing .valueOf() without 
checks, then use the number of non-numbers encountered to decide if want to be 
making the sanityCheckIntegerLongDecimal() calls.

[1]  - by correct I mean the case where the field declared an integer or a 
double has correct representation.

> Optimize byte to Long/Integer conversions
> -
>
> Key: PIG-3259
> URL: https://issues.apache.org/jira/browse/PIG-3259
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11, 0.11.1
>Reporter: Prashant Kommireddi
>Assignee: Prashant Kommireddi
> Fix For: 0.12
>
> Attachments: byteToLong.xlsx
>
>
> These conversions can be performing better. If the input is not numeric 
> (1234abcd) the code calls Double.valueOf(String) regardless before finally 
> returning null. Any script that inadvertently (user's mistake or not) tries 
> to cast non-numeric column to int or long would result in many wasteful 
> calls. 
> We can avoid this and only handle the cases we find the input to be a decimal 
> number (1234.56) and return null otherwise even before trying 
> Double.valueOf(String).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3248) Upgrade hadoop-2.0.0-alpha to hadoop-2.0.3-alpha

2013-03-18 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13605615#comment-13605615
 ] 

Thejas M Nair commented on PIG-3248:


+1

> Upgrade hadoop-2.0.0-alpha to hadoop-2.0.3-alpha
> 
>
> Key: PIG-3248
> URL: https://issues.apache.org/jira/browse/PIG-3248
> Project: Pig
>  Issue Type: Improvement
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.12
>
> Attachments: PIG-3248-1.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3248) Upgrade hadoop-2.0.0-alpha to hadoop-2.0.3-alpha

2013-03-18 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13605584#comment-13605584
 ] 

Thejas M Nair commented on PIG-3248:


bq. This is actually a different issue. TestPigSplit always fail on my test 
machine due to stack overflow. It is Ok if you want me open a separate Jira for 
this issue.
As it is a very minor change to a test case, i think its fine to include it 
here. 200 is large enough for number of statements, so i think so many levels 
instead of 500 should be ok, if it is causing issues with some jvm configs.



> Upgrade hadoop-2.0.0-alpha to hadoop-2.0.3-alpha
> 
>
> Key: PIG-3248
> URL: https://issues.apache.org/jira/browse/PIG-3248
> Project: Pig
>  Issue Type: Improvement
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.12
>
> Attachments: PIG-3248-1.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3248) Upgrade hadoop-2.0.0-alpha to hadoop-2.0.3-alpha

2013-03-18 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13605460#comment-13605460
 ] 

Thejas M Nair commented on PIG-3248:


Daniel, Is the following change relevant for the hadoop 2 version upgrade ?

{code}
===
--- test/org/apache/pig/test/TestPigSplit.java  (revision 1456106)
+++ test/org/apache/pig/test/TestPigSplit.java  (working copy)
@@ -108,7 +108,7 @@
 createInput(new String[] { "0\ta" });
 
 pigServer.registerQuery("a = load '" + inputFileName + "';");
-for (int i = 0; i < 500; i++) {
+for (int i = 0; i < 200; i++) {
 pigServer.registerQuery("a = filter a by $0 == '1';");
 }
 Iterator iter = pigServer.openIterator("a");
{code}

> Upgrade hadoop-2.0.0-alpha to hadoop-2.0.3-alpha
> 
>
> Key: PIG-3248
> URL: https://issues.apache.org/jira/browse/PIG-3248
> Project: Pig
>  Issue Type: Improvement
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.12
>
> Attachments: PIG-3248-1.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3214) New/improved mascot

2013-03-08 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-3214:
---

Attachment: pig_6_lc_g.JPG

I think lower case 'g' in Julien's suggestion will make the tail attachment 
look better.  Attaching pig_6_lc_g.JPG to illustrate what I am thinking.

> New/improved mascot
> ---
>
> Key: PIG-3214
> URL: https://issues.apache.org/jira/browse/PIG-3214
> Project: Pig
>  Issue Type: Wish
>  Components: site
>Affects Versions: 0.11
>Reporter: Andrew Musselman
>Priority: Minor
> Fix For: 0.12
>
> Attachments: apache-pig-14.png, apache-pig-yellow-logo.png, 
> newlogo1.png, newlogo2.png, newlogo3.png, newlogo4.png, newlogo5.png, 
> new_logo_7.png, pig_6.JPG, pig_6_lc_g.JPG, pig-logo-10.png, pig-logo-11.png, 
> pig-logo-12.png, pig-logo-13.png, pig-logo-8a.png, pig-logo-8b.png, 
> pig-logo-9a.png, pig-logo-9b.png, pig_logo_new.png
>
>
> Request to change pig mascot to something more graphically appealing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3089) Implicit relation names

2012-12-11 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13529125#comment-13529125
 ] 

Thejas M Nair commented on PIG-3089:


In my opinion, too many rules for implicit relation names would make pig 
scripts (written by others) hard to read, specially for people who are new to 
pig. I think it is better to just allow name of preceding relation to be 
referred using a special notation. 

> Implicit relation names
> ---
>
> Key: PIG-3089
> URL: https://issues.apache.org/jira/browse/PIG-3089
> Project: Pig
>  Issue Type: New Feature
>  Components: grunt, parser
>Reporter: Russell Jurney
>Assignee: Jonathan Coveney
>
> A = load foo;
> B = load bar;
> filter A by id > 5;
> join A_1 by id, B by id;
> // or A_filter
> foreach A_1_B generate id;
> store into foobar; // A_1_B_1 or A_filter_B_generate
> Or some such routine?
> We don't have to be explicit no more!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3044) Trigger POPartialAgg compaction under GC pressure

2012-12-07 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13526600#comment-13526600
 ] 

Thejas M Nair commented on PIG-3044:


bq. I would even say we remove the % memory budget as the Spillable mechanism 
is more reliable and much simpler.
The reason why % memory budget was introduced for SelfSpillBag, was because the 
spillable mechanism didn't always work well. The cleanup often was getting 
triggered too late. So I think it is better use the Spillable mechanism here to 
spill earlier if necessary, as the patch is doing.


> Trigger POPartialAgg compaction under GC pressure
> -
>
> Key: PIG-3044
> URL: https://issues.apache.org/jira/browse/PIG-3044
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.10.0, 0.11, 0.10.1
>Reporter: Dmitriy V. Ryaboy
>Assignee: Dmitriy V. Ryaboy
> Fix For: 0.11, 0.12
>
> Attachments: PIG-3044.2.diff, PIG-3404.diff
>
>
> If partial aggregation is turned on in pig 10 and 11, 20% (by default) of the 
> available heap can be consumed by the POPartialAgg operator. This can cause 
> memory issues for jobs that use all, or nearly all, of the heap already.
> If we make POPartialAgg "spillable" (trigger compaction when memory reduction 
> is required), we would be much nicer to high-memory jobs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3071) update hcatalog jar and path to hbase storage handler har

2012-11-28 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-3071:
---

Labels: hcatalog  (was: )

> update hcatalog jar and path to hbase storage handler har
> -
>
> Key: PIG-3071
> URL: https://issues.apache.org/jira/browse/PIG-3071
> Project: Pig
>  Issue Type: Bug
>Reporter: Arpit Gupta
>  Labels: hcatalog
> Attachments: PIG-3071.patch
>
>
> Due to changes in hcatalog 0.5 packaging we need to update the hcatalog jar 
> name and the path to the hbase storage handler jar.
> pig script should be updated to work with either version.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2176) add logical plan assumption checker

2012-11-12 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-2176:
---

Assignee: Thejas M Nair

> add logical plan assumption checker 
> 
>
> Key: PIG-2176
> URL: https://issues.apache.org/jira/browse/PIG-2176
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.9.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.10.0
>
> Attachments: PIG-2176.1.patch, PIG-2176.2.patch
>
>
> Pig expects certain things about LogicalPlan, and optimizer logic depends on 
> those to be true. Could that verifies that these assumptions are true will 
> help in catching issues early on. 
> Some of the assumptions that should be checked - 
> 1. All schema have valid uid . (not -1).
> 2. All fields in schema have distinct uid. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2959) Add a pig.cmd for Pig to run under Windows

2012-11-11 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13495092#comment-13495092
 ] 

Thejas M Nair commented on PIG-2959:


+1

> Add a pig.cmd for Pig to run under Windows
> --
>
> Key: PIG-2959
> URL: https://issues.apache.org/jira/browse/PIG-2959
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.11
>
> Attachments: pig.cmd
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2980) documentation for DateTime datatype

2012-11-09 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494274#comment-13494274
 ] 

Thejas M Nair commented on PIG-2980:


bq. Yes, I mean ToDate('1970-01-01T00:00:00.000+00:00'). where users can 
specify a constant string to create a datetime object. Let me rephrase the 
description here.
I think we can just remove datestamp from the constants table and add a note 
under the table, that users should use ToDate udf to generate datetime from 
string constants. 
 

> documentation for DateTime datatype
> ---
>
> Key: PIG-2980
> URL: https://issues.apache.org/jira/browse/PIG-2980
> Project: Pig
>  Issue Type: Bug
>  Components: documentation
>Reporter: Thejas M Nair
>Assignee: Zhijie Shen
> Fix For: 0.11
>
> Attachments: PIG-2980.patch
>
>
> Documentation for new DateTime type needs to be added.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2980) documentation for DateTime datatype

2012-11-07 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13492934#comment-13492934
 ] 

Thejas M Nair commented on PIG-2980:


Thanks for the patch Zhijie. It looks good.
But it says that datestamp constants are supported, but I guess if you pass 
'1970-01-01T00:00:00.000+00:00' to pig (say as an argument to a udf), i believe 
it would get interpreted as a string . Ie, we support chararray constants that 
can be cast to datetime, but not a datetime constant per se. Is that correct ? 
(I think it makes sense to support datetime constants, using a format that does 
not cause ambiguity wrt chararray type. But that would be another jira).



> documentation for DateTime datatype
> ---
>
> Key: PIG-2980
> URL: https://issues.apache.org/jira/browse/PIG-2980
> Project: Pig
>  Issue Type: Bug
>  Components: documentation
>Reporter: Thejas M Nair
>Assignee: Zhijie Shen
> Fix For: 0.11
>
> Attachments: PIG-2980.patch
>
>
> Documentation for new DateTime type needs to be added.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2981) add e2e tests for DateTime data type

2012-10-30 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13486695#comment-13486695
 ] 

Thejas M Nair commented on PIG-2981:


I will try to do it this week, and get it into 0.11 . But for now putting a fix 
version of 0.12 , just in case.

> add e2e tests for DateTime  data type
> -
>
> Key: PIG-2981
> URL: https://issues.apache.org/jira/browse/PIG-2981
> Project: Pig
>  Issue Type: Test
>Reporter: Thejas M Nair
> Fix For: 0.12
>
>
> e2e tests for DateTime datatype need to be added.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2434) investigate 5% slowdown in TPC-H Q6 query in 0.10

2012-10-30 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13486693#comment-13486693
 ] 

Thejas M Nair commented on PIG-2434:


Unlinking from 0.11 .

> investigate 5% slowdown in TPC-H Q6 query in 0.10
> -
>
> Key: PIG-2434
> URL: https://issues.apache.org/jira/browse/PIG-2434
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.10.0
>Reporter: Thejas M Nair
>
> 0.10 is slower than 0.9 by around 5% for TPC-H Q6 query as per observation in 
> https://issues.apache.org/jira/browse/PIG-2228?focusedCommentId=13171461&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13171461
>  .
> This needs to be investigated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2981) add e2e tests for DateTime data type

2012-10-30 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-2981:
---

Fix Version/s: (was: 0.11)
   0.12

> add e2e tests for DateTime  data type
> -
>
> Key: PIG-2981
> URL: https://issues.apache.org/jira/browse/PIG-2981
> Project: Pig
>  Issue Type: Test
>Reporter: Thejas M Nair
> Fix For: 0.12
>
>
> e2e tests for DateTime datatype need to be added.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2434) investigate 5% slowdown in TPC-H Q6 query in 0.10

2012-10-30 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-2434:
---

Fix Version/s: (was: 0.11)

> investigate 5% slowdown in TPC-H Q6 query in 0.10
> -
>
> Key: PIG-2434
> URL: https://issues.apache.org/jira/browse/PIG-2434
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.10.0
>Reporter: Thejas M Nair
>
> 0.10 is slower than 0.9 by around 5% for TPC-H Q6 query as per observation in 
> https://issues.apache.org/jira/browse/PIG-2228?focusedCommentId=13171461&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13171461
>  .
> This needs to be investigated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3007) support group-by collected for load funcs that don't implement CollectableLoadFunc

2012-10-25 Thread Thejas M Nair (JIRA)
Thejas M Nair created PIG-3007:
--

 Summary: support group-by collected for load funcs that don't 
implement CollectableLoadFunc
 Key: PIG-3007
 URL: https://issues.apache.org/jira/browse/PIG-3007
 Project: Pig
  Issue Type: New Feature
Reporter: Thejas M Nair


group-by collected should be supported for all input that are sorted on 
group-by keys.
To ensure that a map task gets all records for a group-key, indexing can be 
done to determine which key at which it should start processing , and if it 
should read from next split as well to get remaining records for the last 
group-by column in its original split.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2982) add unit tests for DateTime type that test setting timezone

2012-10-22 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482047#comment-13482047
 ] 

Thejas M Nair commented on PIG-2982:


+1. Will commit after running tests.


> add unit tests for DateTime type that test setting timezone
> ---
>
> Key: PIG-2982
> URL: https://issues.apache.org/jira/browse/PIG-2982
> Project: Pig
>  Issue Type: Test
>Reporter: Thejas M Nair
>Assignee: Zhijie Shen
> Fix For: 0.11
>
> Attachments: PIG-2982.patch
>
>
> The default timezone can be set for the new DateTime type. We need to add 
> unit tests that test this functionality. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2980) documentation for DateTime datatype

2012-10-22 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481936#comment-13481936
 ] 

Thejas M Nair commented on PIG-2980:


Olga,
Zhijie is planning to work on this. If you can help with the formatting, that 
would be great!

> documentation for DateTime datatype
> ---
>
> Key: PIG-2980
> URL: https://issues.apache.org/jira/browse/PIG-2980
> Project: Pig
>  Issue Type: Bug
>  Components: documentation
>Reporter: Thejas M Nair
> Fix For: 0.11
>
>
> Documentation for new DateTime type needs to be added.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-1314) Add DateTime Support to Pig

2012-10-15 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair resolved PIG-1314.


Resolution: Fixed

As Dmitriy suggested, closing this jira and opened new ones for remaining work 
- PIG-2980, PIG-2981, PIG-2982 .


> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: joda_vs_builtin.zip, PIG-1314-1.patch, PIG-1314-2.patch, 
> PIG-1314-3.patch, PIG-1314-4.patch, PIG-1314-5.patch, PIG-1314-6.patch, 
> PIG-1314-7.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-2982) add unit tests for DateTime type that test setting timezone

2012-10-15 Thread Thejas M Nair (JIRA)
Thejas M Nair created PIG-2982:
--

 Summary: add unit tests for DateTime type that test setting 
timezone
 Key: PIG-2982
 URL: https://issues.apache.org/jira/browse/PIG-2982
 Project: Pig
  Issue Type: Test
Reporter: Thejas M Nair
 Fix For: 0.11


The default timezone can be set for the new DateTime type. We need to add unit 
tests that test this functionality. 


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-2981) add e2e tests for DateTime data type

2012-10-15 Thread Thejas M Nair (JIRA)
Thejas M Nair created PIG-2981:
--

 Summary: add e2e tests for DateTime  data type
 Key: PIG-2981
 URL: https://issues.apache.org/jira/browse/PIG-2981
 Project: Pig
  Issue Type: Test
Reporter: Thejas M Nair
 Fix For: 0.11


e2e tests for DateTime datatype need to be added.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-2980) documentation for DateTime datatype

2012-10-15 Thread Thejas M Nair (JIRA)
Thejas M Nair created PIG-2980:
--

 Summary: documentation for DateTime datatype
 Key: PIG-2980
 URL: https://issues.apache.org/jira/browse/PIG-2980
 Project: Pig
  Issue Type: Bug
  Components: documentation
Reporter: Thejas M Nair
 Fix For: 0.11


Documentation for new DateTime type needs to be added.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2910) Add function to read schema from outout of Schema.toString()

2012-10-12 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-2910:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

+1 Patch committed to trunk.
Eli, Thanks for the patch!

> Add function to read schema from outout of Schema.toString()
> 
>
> Key: PIG-2910
> URL: https://issues.apache.org/jira/browse/PIG-2910
> Project: Pig
>  Issue Type: Improvement
>  Components: impl, parser
>Affects Versions: 0.9.2, 0.10.0, 0.11, 0.10.1
>Reporter: Russell Jurney
>Assignee: Eli Reisman
>  Labels: newbie
> Fix For: 0.11
>
> Attachments: PIG-2910-1.patch, PIG-2910-2.patch, PIG-2910-3.patch, 
> PIG-2910-4.patch
>
>
> I want to toString() schemas and send them to the backend via UDFContext. At 
> the moment this requires writing your own toString() method that 
> Utils.getSchemaFromString() can read. Making a readable schema for the 
> backend would be an improvement.
> I spoke with Thejas, who believes this is a bug. The workaround for the 
> moment is, for example:
> String schemaString = inputSchema.toString().substring(1, 
> inputSchema.toString().length() - 1);
> // Set the input schema for processing
> UDFContext context = UDFContext.getUDFContext();
> Properties udfProp = context.getUDFProperties(this.getClass());
> udfProp.setProperty("horton.json.udf.schema", schemaString);
> ...
> schema = Utils.getSchemaFromString(strSchema);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2910) Add function to read schema from outout of Schema.toString()

2012-10-12 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-2910:
---

Fix Version/s: (was: 0.10.1)
   Status: Patch Available  (was: Open)

> Add function to read schema from outout of Schema.toString()
> 
>
> Key: PIG-2910
> URL: https://issues.apache.org/jira/browse/PIG-2910
> Project: Pig
>  Issue Type: Improvement
>  Components: impl, parser
>Affects Versions: 0.10.0, 0.9.2, 0.11, 0.10.1
>Reporter: Russell Jurney
>Assignee: Eli Reisman
>  Labels: newbie
> Fix For: 0.11
>
> Attachments: PIG-2910-1.patch, PIG-2910-2.patch, PIG-2910-3.patch, 
> PIG-2910-4.patch
>
>
> I want to toString() schemas and send them to the backend via UDFContext. At 
> the moment this requires writing your own toString() method that 
> Utils.getSchemaFromString() can read. Making a readable schema for the 
> backend would be an improvement.
> I spoke with Thejas, who believes this is a bug. The workaround for the 
> moment is, for example:
> String schemaString = inputSchema.toString().substring(1, 
> inputSchema.toString().length() - 1);
> // Set the input schema for processing
> UDFContext context = UDFContext.getUDFContext();
> Properties udfProp = context.getUDFProperties(this.getClass());
> udfProp.setProperty("horton.json.udf.schema", schemaString);
> ...
> schema = Utils.getSchemaFromString(strSchema);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2910) Add function to read schema from outout of Schema.toString()

2012-10-12 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-2910:
---

Summary: Add function to read schema from outout of Schema.toString()  
(was: Make toString() methods on Schema and FieldSchema be readable by 
Utils.getSchemaFromString())

> Add function to read schema from outout of Schema.toString()
> 
>
> Key: PIG-2910
> URL: https://issues.apache.org/jira/browse/PIG-2910
> Project: Pig
>  Issue Type: Improvement
>  Components: impl, parser
>Affects Versions: 0.9.2, 0.10.0, 0.11, 0.10.1
>Reporter: Russell Jurney
>Assignee: Eli Reisman
>  Labels: newbie
> Fix For: 0.11, 0.10.1
>
> Attachments: PIG-2910-1.patch, PIG-2910-2.patch, PIG-2910-3.patch, 
> PIG-2910-4.patch
>
>
> I want to toString() schemas and send them to the backend via UDFContext. At 
> the moment this requires writing your own toString() method that 
> Utils.getSchemaFromString() can read. Making a readable schema for the 
> backend would be an improvement.
> I spoke with Thejas, who believes this is a bug. The workaround for the 
> moment is, for example:
> String schemaString = inputSchema.toString().substring(1, 
> inputSchema.toString().length() - 1);
> // Set the input schema for processing
> UDFContext context = UDFContext.getUDFContext();
> Properties udfProp = context.getUDFProperties(this.getClass());
> udfProp.setProperty("horton.json.udf.schema", schemaString);
> ...
> schema = Utils.getSchemaFromString(strSchema);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2910) Make toString() methods on Schema and FieldSchema be readable by Utils.getSchemaFromString()

2012-10-12 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-2910:
---

Issue Type: Improvement  (was: Bug)

> Make toString() methods on Schema and FieldSchema be readable by 
> Utils.getSchemaFromString()
> 
>
> Key: PIG-2910
> URL: https://issues.apache.org/jira/browse/PIG-2910
> Project: Pig
>  Issue Type: Improvement
>  Components: impl, parser
>Affects Versions: 0.9.2, 0.10.0, 0.11, 0.10.1
>Reporter: Russell Jurney
>Assignee: Eli Reisman
>  Labels: newbie
> Fix For: 0.11, 0.10.1
>
> Attachments: PIG-2910-1.patch, PIG-2910-2.patch, PIG-2910-3.patch, 
> PIG-2910-4.patch
>
>
> I want to toString() schemas and send them to the backend via UDFContext. At 
> the moment this requires writing your own toString() method that 
> Utils.getSchemaFromString() can read. Making a readable schema for the 
> backend would be an improvement.
> I spoke with Thejas, who believes this is a bug. The workaround for the 
> moment is, for example:
> String schemaString = inputSchema.toString().substring(1, 
> inputSchema.toString().length() - 1);
> // Set the input schema for processing
> UDFContext context = UDFContext.getUDFContext();
> Properties udfProp = context.getUDFProperties(this.getClass());
> udfProp.setProperty("horton.json.udf.schema", schemaString);
> ...
> schema = Utils.getSchemaFromString(strSchema);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2910) Make toString() methods on Schema and FieldSchema be readable by Utils.getSchemaFromString()

2012-10-12 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-2910:
---

Assignee: Eli Reisman  (was: Thejas M Nair)

> Make toString() methods on Schema and FieldSchema be readable by 
> Utils.getSchemaFromString()
> 
>
> Key: PIG-2910
> URL: https://issues.apache.org/jira/browse/PIG-2910
> Project: Pig
>  Issue Type: Bug
>  Components: impl, parser
>Affects Versions: 0.9.2, 0.10.0, 0.11, 0.10.1
>Reporter: Russell Jurney
>Assignee: Eli Reisman
>  Labels: newbie
> Fix For: 0.11, 0.10.1
>
> Attachments: PIG-2910-1.patch, PIG-2910-2.patch, PIG-2910-3.patch, 
> PIG-2910-4.patch
>
>
> I want to toString() schemas and send them to the backend via UDFContext. At 
> the moment this requires writing your own toString() method that 
> Utils.getSchemaFromString() can read. Making a readable schema for the 
> backend would be an improvement.
> I spoke with Thejas, who believes this is a bug. The workaround for the 
> moment is, for example:
> String schemaString = inputSchema.toString().substring(1, 
> inputSchema.toString().length() - 1);
> // Set the input schema for processing
> UDFContext context = UDFContext.getUDFContext();
> Properties udfProp = context.getUDFProperties(this.getClass());
> udfProp.setProperty("horton.json.udf.schema", schemaString);
> ...
> schema = Utils.getSchemaFromString(strSchema);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2910) Make toString() methods on Schema and FieldSchema be readable by Utils.getSchemaFromString()

2012-10-10 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473535#comment-13473535
 ] 

Thejas M Nair commented on PIG-2910:


Yes, The changes in 2910-3 patch look good. Can you please add a test case, and 
also add a comment that this schema string has "{}" around it, and that is why 
the substring is being done ?


> Make toString() methods on Schema and FieldSchema be readable by 
> Utils.getSchemaFromString()
> 
>
> Key: PIG-2910
> URL: https://issues.apache.org/jira/browse/PIG-2910
> Project: Pig
>  Issue Type: Bug
>  Components: impl, parser
>Affects Versions: 0.9.2, 0.10.0, 0.11, 0.10.1
>Reporter: Russell Jurney
>Assignee: Thejas M Nair
>  Labels: newbie
> Fix For: 0.11, 0.10.1
>
> Attachments: PIG-2910-1.patch, PIG-2910-2.patch, PIG-2910-3.patch
>
>
> I want to toString() schemas and send them to the backend via UDFContext. At 
> the moment this requires writing your own toString() method that 
> Utils.getSchemaFromString() can read. Making a readable schema for the 
> backend would be an improvement.
> I spoke with Thejas, who believes this is a bug. The workaround for the 
> moment is, for example:
> String schemaString = inputSchema.toString().substring(1, 
> inputSchema.toString().length() - 1);
> // Set the input schema for processing
> UDFContext context = UDFContext.getUDFContext();
> Properties udfProp = context.getUDFProperties(this.getClass());
> udfProp.setProperty("horton.json.udf.schema", schemaString);
> ...
> schema = Utils.getSchemaFromString(strSchema);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2910) Make toString() methods on Schema and FieldSchema be readable by Utils.getSchemaFromString()

2012-10-09 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13472941#comment-13472941
 ] 

Thejas M Nair commented on PIG-2910:


The extra parenthesis added by Schema.toString() are curly braces. I believe 
this is because the schema when thought of as a schema of a relation, is a 
schema of a bag. I don't think the PIG-2910-2.patch will fix the issue.

But changing the behavior of either Schema.toString() or 
Utils.getSchemaFromString() will break backward compatibility. I think we 
should just add a new function Utils.getSchemaFromBagSchemaString() and comment 
that people should use this one to get schema back from output of 
Schema.toString(). 

The use of input schema of udf, during udf execution is not very common. I 
don't think we should serialize it for all udfs.


> Make toString() methods on Schema and FieldSchema be readable by 
> Utils.getSchemaFromString()
> 
>
> Key: PIG-2910
> URL: https://issues.apache.org/jira/browse/PIG-2910
> Project: Pig
>  Issue Type: Bug
>  Components: impl, parser
>Affects Versions: 0.9.2, 0.10.0, 0.11, 0.10.1
>Reporter: Russell Jurney
>Assignee: Thejas M Nair
>  Labels: newbie
> Fix For: 0.11, 0.10.1
>
> Attachments: PIG-2910-1.patch, PIG-2910-2.patch
>
>
> I want to toString() schemas and send them to the backend via UDFContext. At 
> the moment this requires writing your own toString() method that 
> Utils.getSchemaFromString() can read. Making a readable schema for the 
> backend would be an improvement.
> I spoke with Thejas, who believes this is a bug. The workaround for the 
> moment is, for example:
> String schemaString = inputSchema.toString().substring(1, 
> inputSchema.toString().length() - 1);
> // Set the input schema for processing
> UDFContext context = UDFContext.getUDFContext();
> Properties udfProp = context.getUDFProperties(this.getClass());
> udfProp.setProperty("horton.json.udf.schema", schemaString);
> ...
> schema = Utils.getSchemaFromString(strSchema);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2769) a simple logic causes very long compiling time on pig 0.10.0

2012-10-04 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13469592#comment-13469592
 ] 

Thejas M Nair commented on PIG-2769:


I have assigned it to you. I have also added you to contributors list, now you 
should be able to assign jiras to yourself.


> a simple logic causes very long compiling time on pig 0.10.0
> 
>
> Key: PIG-2769
> URL: https://issues.apache.org/jira/browse/PIG-2769
> Project: Pig
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.10.0
> Environment: Apache Pig version 0.10.0-SNAPSHOT (rexported)
>Reporter: Dan Li
>Assignee: Timothy Chen
> Fix For: 0.11
>
> Attachments: case1.tar
>
>
> We found the following simple logic will cause very long compiling time for 
> pig 0.10.0, while using pig 0.8.1, everything is fine.
> A = load 'A.txt' using PigStorage()  AS (m: int);
> B = FOREACH A {
> days_str = (chararray)
> (m == 1 ? 31: 
> (m == 2 ? 28: 
> (m == 3 ? 31: 
> (m == 4 ? 30: 
> (m == 5 ? 31: 
> (m == 6 ? 30: 
> (m == 7 ? 31: 
> (m == 8 ? 31: 
> (m == 9 ? 30: 
> (m == 10 ? 31: 
> (m == 11 ? 30:31)));
> GENERATE
>days_str as days_str;
> }   
> store B into 'B';
> and here's a simple input file example: A.txt
> 1
> 2
> 3
> The pig version we used in the test
> Apache Pig version 0.10.0-SNAPSHOT (rexported)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2769) a simple logic causes very long compiling time on pig 0.10.0

2012-10-04 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-2769:
---

Assignee: Timothy Chen

> a simple logic causes very long compiling time on pig 0.10.0
> 
>
> Key: PIG-2769
> URL: https://issues.apache.org/jira/browse/PIG-2769
> Project: Pig
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.10.0
> Environment: Apache Pig version 0.10.0-SNAPSHOT (rexported)
>Reporter: Dan Li
>Assignee: Timothy Chen
> Fix For: 0.11
>
> Attachments: case1.tar
>
>
> We found the following simple logic will cause very long compiling time for 
> pig 0.10.0, while using pig 0.8.1, everything is fine.
> A = load 'A.txt' using PigStorage()  AS (m: int);
> B = FOREACH A {
> days_str = (chararray)
> (m == 1 ? 31: 
> (m == 2 ? 28: 
> (m == 3 ? 31: 
> (m == 4 ? 30: 
> (m == 5 ? 31: 
> (m == 6 ? 30: 
> (m == 7 ? 31: 
> (m == 8 ? 31: 
> (m == 9 ? 30: 
> (m == 10 ? 31: 
> (m == 11 ? 30:31)));
> GENERATE
>days_str as days_str;
> }   
> store B into 'B';
> and here's a simple input file example: A.txt
> 1
> 2
> 3
> The pig version we used in the test
> Apache Pig version 0.10.0-SNAPSHOT (rexported)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2911) GenMRSkewJoinProcessor uses File.Separator instead of Path.Separator

2012-09-07 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-2911:
---

Resolution: Invalid
Status: Resolved  (was: Patch Available)

Sorry, created the bug on wrong product! :)


> GenMRSkewJoinProcessor uses File.Separator instead of Path.Separator
> 
>
> Key: PIG-2911
> URL: https://issues.apache.org/jira/browse/PIG-2911
> Project: Pig
>  Issue Type: Bug
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: PIG-2911.1.patch
>
>
> This causes testcase skewjoin.q to fail on windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2911) GenMRSkewJoinProcessor uses File.Separator instead of Path.Separator

2012-09-07 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-2911:
---

Status: Patch Available  (was: Open)

> GenMRSkewJoinProcessor uses File.Separator instead of Path.Separator
> 
>
> Key: PIG-2911
> URL: https://issues.apache.org/jira/browse/PIG-2911
> Project: Pig
>  Issue Type: Bug
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: PIG-2911.1.patch
>
>
> This causes testcase skewjoin.q to fail on windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2911) GenMRSkewJoinProcessor uses File.Separator instead of Path.Separator

2012-09-07 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-2911:
---

Attachment: PIG-2911.1.patch

> GenMRSkewJoinProcessor uses File.Separator instead of Path.Separator
> 
>
> Key: PIG-2911
> URL: https://issues.apache.org/jira/browse/PIG-2911
> Project: Pig
>  Issue Type: Bug
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: PIG-2911.1.patch
>
>
> This causes testcase skewjoin.q to fail on windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-2911) GenMRSkewJoinProcessor uses File.Separator instead of Path.Separator

2012-09-07 Thread Thejas M Nair (JIRA)
Thejas M Nair created PIG-2911:
--

 Summary: GenMRSkewJoinProcessor uses File.Separator instead of 
Path.Separator
 Key: PIG-2911
 URL: https://issues.apache.org/jira/browse/PIG-2911
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: PIG-2911.1.patch

This causes testcase skewjoin.q to fail on windows.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2895) jodatime jar missing in pig-withouthadoop.jar

2012-08-30 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-2895:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

patch committed to trunk

> jodatime jar missing in pig-withouthadoop.jar
> -
>
> Key: PIG-2895
> URL: https://issues.apache.org/jira/browse/PIG-2895
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.11
>
> Attachments: PIG-2895.1.patch, PIG-2895.2.patch
>
>
> jodatime jar is missing in pig-withouthadoop.jar. When an external hadoop.jar 
> is used, pig will fail with class not found error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2895) jodatime jar missing in pig-withouthadoop.jar

2012-08-29 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-2895:
---

Status: Patch Available  (was: Open)

Patch tested against a multi-node hadoop cluster.


> jodatime jar missing in pig-withouthadoop.jar
> -
>
> Key: PIG-2895
> URL: https://issues.apache.org/jira/browse/PIG-2895
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.11
>
> Attachments: PIG-2895.1.patch, PIG-2895.2.patch
>
>
> jodatime jar is missing in pig-withouthadoop.jar. When an external hadoop.jar 
> is used, pig will fail with class not found error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2895) jodatime jar missing in pig-withouthadoop.jar

2012-08-29 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-2895:
---

Attachment: PIG-2895.2.patch

PIG-2895.2.patch - adds "org/joda/time" to pigPackagesToSend.
Thanks Julien for the directions!


> jodatime jar missing in pig-withouthadoop.jar
> -
>
> Key: PIG-2895
> URL: https://issues.apache.org/jira/browse/PIG-2895
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.11
>
> Attachments: PIG-2895.1.patch, PIG-2895.2.patch
>
>
> jodatime jar is missing in pig-withouthadoop.jar. When an external hadoop.jar 
> is used, pig will fail with class not found error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (PIG-2893) fix DBStorage compile issue

2012-08-29 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair reassigned PIG-2893:
--

Assignee: Thejas M Nair

> fix DBStorage compile issue
> ---
>
> Key: PIG-2893
> URL: https://issues.apache.org/jira/browse/PIG-2893
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.11
>
> Attachments: PIG-2893.1.patch
>
>
> DBStorage does not compile after the datetime patch was committed. The joda 
> datetime was passed as argument to java.sql.PreparedStatement.setDate() 
> instead of java.sql.Date .

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-08-28 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443630#comment-13443630
 ] 

Thejas M Nair commented on PIG-1314:


Yes, that was not intentional. Deleted JobControlCompiler.java.orig in svn.


> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: joda_vs_builtin.zip, PIG-1314-1.patch, PIG-1314-2.patch, 
> PIG-1314-3.patch, PIG-1314-4.patch, PIG-1314-5.patch, PIG-1314-6.patch, 
> PIG-1314-7.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2895) jodatime jar missing in pig-withouthadoop.jar

2012-08-28 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-2895:
---

Attachment: PIG-2895.1.patch

> jodatime jar missing in pig-withouthadoop.jar
> -
>
> Key: PIG-2895
> URL: https://issues.apache.org/jira/browse/PIG-2895
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.11
>
> Attachments: PIG-2895.1.patch
>
>
> jodatime jar is missing in pig-withouthadoop.jar. When an external hadoop.jar 
> is used, pig will fail with class not found error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-2895) jodatime jar missing in pig-withouthadoop.jar

2012-08-28 Thread Thejas M Nair (JIRA)
Thejas M Nair created PIG-2895:
--

 Summary: jodatime jar missing in pig-withouthadoop.jar
 Key: PIG-2895
 URL: https://issues.apache.org/jira/browse/PIG-2895
 Project: Pig
  Issue Type: Sub-task
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.11


jodatime jar is missing in pig-withouthadoop.jar. When an external hadoop.jar 
is used, pig will fail with class not found error.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2893) fix DBStorage compile issue

2012-08-27 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13442922#comment-13442922
 ] 

Thejas M Nair commented on PIG-2893:


The compile error was - 
[javac] symbol  : method setDate(int,java.util.Date)
[javac] location: interface java.sql.PreparedStatement
[javac] ps.setDate(sqlPos, ((DateTime) field).toDate());

> fix DBStorage compile issue
> ---
>
> Key: PIG-2893
> URL: https://issues.apache.org/jira/browse/PIG-2893
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Thejas M Nair
> Attachments: PIG-2893.1.patch
>
>
> DBStorage does not compile after the datetime patch was committed. The joda 
> datetime was passed as argument to java.sql.PreparedStatement.setDate() 
> instead of java.sql.Date .

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-2893) fix DBStorage compile issue

2012-08-27 Thread Thejas M Nair (JIRA)
Thejas M Nair created PIG-2893:
--

 Summary: fix DBStorage compile issue
 Key: PIG-2893
 URL: https://issues.apache.org/jira/browse/PIG-2893
 Project: Pig
  Issue Type: Sub-task
Reporter: Thejas M Nair
 Attachments: PIG-2893.1.patch

DBStorage does not compile after the datetime patch was committed. The joda 
datetime was passed as argument to java.sql.PreparedStatement.setDate() instead 
of java.sql.Date .


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2893) fix DBStorage compile issue

2012-08-27 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-2893:
---

Attachment: PIG-2893.1.patch

PIG-2893.1.patch - fix for compile issue, updates to test case to use datetime 
type.


> fix DBStorage compile issue
> ---
>
> Key: PIG-2893
> URL: https://issues.apache.org/jira/browse/PIG-2893
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Thejas M Nair
> Attachments: PIG-2893.1.patch
>
>
> DBStorage does not compile after the datetime patch was committed. The joda 
> datetime was passed as argument to java.sql.PreparedStatement.setDate() 
> instead of java.sql.Date .

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-08-23 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440858#comment-13440858
 ] 

Thejas M Nair commented on PIG-1314:


We also need to have some test cases that set the timezone property. This might 
not be easy to do in the e2e framework, so unit test cases are better candidate 
for this. Please let me know if you need any help.


> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: joda_vs_builtin.zip, PIG-1314-1.patch, PIG-1314-2.patch, 
> PIG-1314-3.patch, PIG-1314-4.patch, PIG-1314-5.patch, PIG-1314-6.patch, 
> PIG-1314-7.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-08-23 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440851#comment-13440851
 ] 

Thejas M Nair commented on PIG-1314:


PIG-1314-7.patch committed to trunk! Thanks Zhijie.

We need to update the documentation regarding this change. Can you please 
upload a new patch for that ? To see generated docs, run - ant 
-Dforrest.home= docs. The files to be edited are 
under - trunk/src/docs/src/documentation/ .

We should also add a few end to end test cases for datetime. See 
https://cwiki.apache.org/confluence/display/PIG/HowToTest#HowToTest-EndtoendTesting
 . We should have a few queries that do some of the basic operations on date 
time, and queries that have order-by , group and join on date fields. 
These can be submitted as multiple patches.  

> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: joda_vs_builtin.zip, PIG-1314-1.patch, PIG-1314-2.patch, 
> PIG-1314-3.patch, PIG-1314-4.patch, PIG-1314-5.patch, PIG-1314-6.patch, 
> PIG-1314-7.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2811) Updating .eclipse.templates/.classpath with the Newest Jython Version

2012-08-23 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-2811:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Fixed in the PIG-1314 patch.


> Updating .eclipse.templates/.classpath with the Newest Jython Version
> -
>
> Key: PIG-2811
> URL: https://issues.apache.org/jira/browse/PIG-2811
> Project: Pig
>  Issue Type: Bug
>  Components: tools
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>Priority: Trivial
> Fix For: 0.11
>
> Attachments: PIG-2811.patch
>
>
> Jython library version has been upgraded to 2.5.2 by the PIG-2665 patch, but 
> the related modification is not made in the Eclipse template file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2662) skew join does not honor its config parameters

2012-08-23 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440789#comment-13440789
 ] 

Thejas M Nair commented on PIG-2662:


Koji, What OS, JVM are you using ?


> skew join does not honor its config parameters
> --
>
> Key: PIG-2662
> URL: https://issues.apache.org/jira/browse/PIG-2662
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.2, 0.10.0
>Reporter: Thejas M Nair
>Assignee: Rajesh Balamohan
> Fix For: 0.11
>
> Attachments: PIG-2662-0.9.2.patch, PIG-2662.2.patch, PIG-2662.3.patch
>
>
> Skew join can be configured using pig.sksampler.samplerate and 
> pig.skewedjoin.reduce.memusage. But the section of code the retrieves the 
> config values from properties (PoissonSampleLoader.computeSamples) is not 
> getting called. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2662) skew join does not honor its config parameters

2012-08-16 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-2662:
---

Fix Version/s: 0.11
 Assignee: Rajesh Balamohan
   Status: Patch Available  (was: Open)

> skew join does not honor its config parameters
> --
>
> Key: PIG-2662
> URL: https://issues.apache.org/jira/browse/PIG-2662
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.10.0, 0.9.2
>Reporter: Thejas M Nair
>Assignee: Rajesh Balamohan
> Fix For: 0.11
>
> Attachments: PIG-2662-0.9.2.patch, PIG-2662.2.patch, PIG-2662.3.patch
>
>
> Skew join can be configured using pig.sksampler.samplerate and 
> pig.skewedjoin.reduce.memusage. But the section of code the retrieves the 
> config values from properties (PoissonSampleLoader.computeSamples) is not 
> getting called. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2662) skew join does not honor its config parameters

2012-08-16 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-2662:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

+1  Patch committed to trunk.
Thanks Rajesh!


> skew join does not honor its config parameters
> --
>
> Key: PIG-2662
> URL: https://issues.apache.org/jira/browse/PIG-2662
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.2, 0.10.0
>Reporter: Thejas M Nair
>Assignee: Rajesh Balamohan
> Fix For: 0.11
>
> Attachments: PIG-2662-0.9.2.patch, PIG-2662.2.patch, PIG-2662.3.patch
>
>
> Skew join can be configured using pig.sksampler.samplerate and 
> pig.skewedjoin.reduce.memusage. But the section of code the retrieves the 
> config values from properties (PoissonSampleLoader.computeSamples) is not 
> getting called. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-08-14 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434733#comment-13434733
 ] 

Thejas M Nair commented on PIG-1314:


bq. 2. According to your last response, I'm not clear how the default timezone 
of client can be sent to the server with the code. In my opinion, the default 
timezone should be specified on the server side by configuration, which should 
be taken care of by administrators. How do you think about this.

I believe you should be able to set the default timezone property in PigContext 
constructor, and also let user override the default. In backend, you can access 
the value using something like - 
PigMapReduce.sJobConfInternal.get().get("pig.datetime.default.tz").


> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: joda_vs_builtin.zip, PIG-1314-1.patch, PIG-1314-2.patch, 
> PIG-1314-3.patch, PIG-1314-4.patch, PIG-1314-5.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2662) skew join does not honor its config parameters

2012-08-14 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434294#comment-13434294
 ] 

Thejas M Nair commented on PIG-2662:


Rajesh, 
With the patch, TestPoissonSampleLoader test cases fail. Can you please take a 
look ? 
Please let me know if you need any help with that.


> skew join does not honor its config parameters
> --
>
> Key: PIG-2662
> URL: https://issues.apache.org/jira/browse/PIG-2662
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.2, 0.10.0
>Reporter: Thejas M Nair
> Attachments: PIG-2662-0.9.2.patch, PIG-2662.2.patch
>
>
> Skew join can be configured using pig.sksampler.samplerate and 
> pig.skewedjoin.reduce.memusage. But the section of code the retrieves the 
> config values from properties (PoissonSampleLoader.computeSamples) is not 
> getting called. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2662) skew join does not honor its config parameters

2012-08-07 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-2662:
---

Attachment: PIG-2662.2.patch

PIG-2662.2.patch - This patch fixes compile error in previous one (conf 
variable is not declared). Running tests with this one. 

> skew join does not honor its config parameters
> --
>
> Key: PIG-2662
> URL: https://issues.apache.org/jira/browse/PIG-2662
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.2, 0.10.0
>Reporter: Thejas M Nair
> Attachments: PIG-2662-0.9.2.patch, PIG-2662.2.patch
>
>
> Skew join can be configured using pig.sksampler.samplerate and 
> pig.skewedjoin.reduce.memusage. But the section of code the retrieves the 
> config values from properties (PoissonSampleLoader.computeSamples) is not 
> getting called. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-08-03 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13428458#comment-13428458
 ] 

Thejas M Nair commented on PIG-1314:


bq. 1. You've mentioned that we need to propagate the timezone from the client 
to backend, where the udfs get executed. How the timezone should be propagated 
to the backend, which I assume the machine that runs the code? 
Yes
bq. Previously I made the timezone setting in pig.properties, which will be 
loaded when PigServer runs, such that the default timezone will be set. 
Consequently, if a datetime object is created without specifying the timezone, 
the default one will be used. However, do you mean some other way?
It is possible that some of the task nodes  might be misconfigured and have 
different default time zone. In such cases, the results won't be what you want 
and it will be very difficult to debug. So the default timezone on the client 
should be used in the nodes as well. 

bq. I convert the location-based timezone to the utc-offset one and only use 
utc-offset style internally. Therefore, the aforementioned two equal datetime 
objects will not be mis-treated.
Sounds good.


> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: PIG-1314-1.patch, PIG-1314-2.patch, PIG-1314-3.patch, 
> PIG-1314-4.patch, joda_vs_builtin.zip
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2829) Use partial aggregation more aggresively

2012-07-27 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13424140#comment-13424140
 ] 

Thejas M Nair commented on PIG-2829:


Thanks for the benchmark Jie. Clearly, partial-agg is working better than 
combiner. 
Can you also run some benchmarks with combiner turned off, so that we can 
verify the appropriate value for pig.exec.mapPartAgg.minReduction - 

||query || combiner off, partial-agg off || combiner off, partial-agg on ||
|g-by with reduction by 3 | | |
|g-by with reduction by 2| | |


> Use partial aggregation more aggresively
> 
>
> Key: PIG-2829
> URL: https://issues.apache.org/jira/browse/PIG-2829
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.10.0
>Reporter: Jie Li
> Attachments: 2829.1.patch, 2829.2.patch, 2829.separate.options.patch, 
> pigmix-10G.png, tpch-10G.png
>
>
> Partial aggregation (Hash Aggregation, aka in-map combiner) is a new feature 
> in Pig 0.10 that will perform aggregation within map function. The main 
> advantage against combiner is it avoids de/serializing and sorting the data, 
> and it can auto disable itself if the data reduction rate is low. Currently 
> it's disabled by default.
> To leverage the power of PartialAgg more aggressively, several things need to 
> be revisited:
> 1. The threshold of auto-disabling. Currently each mapper looks at first 1k 
> (hard-coded) records to see if there's enough data size reduction (defaults 
> to 10x, configurable). The check would happen earlier if the hash table gets 
> full before processing the 1k records (hash table size is controlled by 
> pig.cachedbag.memusage). We might want to relax these thresholds.
> 2. Dependency on the combiner. Currently the PartialAgg won't work without a 
> combiner following it, so we need to provide separate options to enable each 
> independently. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2829) Use partial aggregation more aggresively

2012-07-26 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13423602#comment-13423602
 ] 

Thejas M Nair commented on PIG-2829:


I will review the patch soon. Some comments regarding the default configuration 
- 

bq. 2: changes existing default values: 
After thinking of the multi-query use case, where you can have multiple 
POPartialAgg operators in a map task, I am having second thoughts on turning 
partial agg on by default. Can you try these settings queries where there are 
around 10+ group+agg that get combined into single MR job ? Maybe we should 
address the potential OOM issues for this use case before we change the 
defaults. This is likely to be become a bigger issue when we use 100k records 
to decide to turn on/off the partial aggregation.

bq. 3: adds a property pig.exec.mapPartAgg.reduction.checkinterval which 
defaults to 100k, so after processing every 100k records mapagg will check the 
reduction rate to see if it should be disabled. Previously we only look at 
first 1000 records.
Can you do some benchmarks to see if there is any noticeable difference in 
runtime because of the delay in turning mapPartAgg off ? 

> Use partial aggregation more aggresively
> 
>
> Key: PIG-2829
> URL: https://issues.apache.org/jira/browse/PIG-2829
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.10.0
>Reporter: Jie Li
> Attachments: 2829.1.patch, 2829.separate.options.patch, 
> pigmix-10G.png, tpch-10G.png
>
>
> Partial aggregation (Hash Aggregation, aka in-map combiner) is a new feature 
> in Pig 0.10 that will perform aggregation within map function. The main 
> advantage against combiner is it avoids de/serializing and sorting the data, 
> and it can auto disable itself if the data reduction rate is low. Currently 
> it's disabled by default.
> To leverage the power of PartialAgg more aggressively, several things need to 
> be revisited:
> 1. The threshold of auto-disabling. Currently each mapper looks at first 1k 
> (hard-coded) records to see if there's enough data size reduction (defaults 
> to 10x, configurable). The check would happen earlier if the hash table gets 
> full before processing the 1k records (hash table size is controlled by 
> pig.cachedbag.memusage). We might want to relax these thresholds.
> 2. Dependency on the combiner. Currently the PartialAgg won't work without a 
> combiner following it, so we need to provide separate options to enable each 
> independently. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2826) Training link on front page no longer points to Pig training

2012-07-20 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419589#comment-13419589
 ] 

Thejas M Nair commented on PIG-2826:


+1

> Training link on front page no longer points to Pig training
> 
>
> Key: PIG-2826
> URL: https://issues.apache.org/jira/browse/PIG-2826
> Project: Pig
>  Issue Type: Bug
>  Components: site
>Affects Versions: site
>Reporter: Alan Gates
>Assignee: Alan Gates
> Fix For: site
>
> Attachments: PIG-2826.patch
>
>
> The training link on Pig's website used to point to a Pig specific video on 
> Cloudera's site.  It now points to a list of all their videos.  Also, at the 
> time they were the only ones providing training videos for Hadoop.  Now other 
> vendors do as well.  This link should be replaced by a link to a wiki page 
> where vendors who wish to can list their training resources.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-07-11 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13412452#comment-13412452
 ] 

Thejas M Nair commented on PIG-1314:


Zhijie,
I have added comments on your latest patch in  
https://reviews.apache.org/r/5414/.
Yes, lets focus on test cases now, so that we can get an initial version 
committed. 

> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: PIG-1314-1.patch, PIG-1314-2.patch, PIG-1314-3.patch, 
> joda_vs_builtin.zip
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-07-03 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13405971#comment-13405971
 ] 

Thejas M Nair commented on PIG-1314:


PigStorage is meant to be a human readable format. So that is another reason to 
store the timestamp in the ISO string as you suggested. 
Yes, If the timezone is specified in the string, pig should use that value. But 
the timezone part and time part of the datetime string should be optional. Does 
jodatime support that ?


> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: PIG-1314-1.patch, PIG-1314-2.patch, joda_vs_builtin.zip
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2774) Fix merge join to work with many duplicate left keys

2012-06-28 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403480#comment-13403480
 ] 

Thejas M Nair commented on PIG-2774:


bq. we might have other operations queued up after the join 

In 2nd approach, the operations within map task don't complicate things. But to 
handle a reduce after the merge-join, we would need to introduce another map 
task that does a union of merge-join results. For example, if the merge-join is 
followed by a group+agg , then the follow transformation to plan would be 
needed. 
Map(Merge-join + group+agg ops) + Reduce(group+agg ops)  
 => Map (merge-join wave 1 + group+agg ops)  + Map (merge-join wave 2 + 
group+agg opps) + Map(union of 1st 2 maps) + Reduce(group+agg ops)

This transformation can't happen dynamically - we can't decide to skip the 
reduce while in the map phase. 


To handle this case dynamically, looks like the first approach is one that 
actually would work! The user or a metadata system possibly identify the skew 
problem and recommend using a 'skew-merge' join next time query is run on 
similar data.



> Fix merge join to work with many duplicate left keys
> 
>
> Key: PIG-2774
> URL: https://issues.apache.org/jira/browse/PIG-2774
> Project: Pig
>  Issue Type: Bug
>Reporter: Aneesh Sharma
>
> A merge join can throw an OOM error if the number of duplicate left tuples is 
> large as it accumulates all of them in memory. There are two solutions around 
> this problem:
> 1. Serialize the accumulated tuples to disk if they exceed a certain size.
> 2. Spit out join output periodically, and re-seek on the right hand side 
> index.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2774) Fix merge join to work with many duplicate left keys

2012-06-28 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403461#comment-13403461
 ] 

Thejas M Nair commented on PIG-2774:


bq. I'd like to avoid having the user encode these details in the pig script. 

Floating some more ideas -

A more performant way of doing this would be to stop accumulating tuples for a 
join key value from left relation into memory when a certain memory threshold 
is exceeded. Once join of these tuples against the right relation is done, 
discard the accumulated left rel tuples for the join key and and load a new 
set, go back to the start of relations with this join key in right relation and 
continue.
To go back more efficiently to the start of join key in right relation we can 
keep track of its record offset. This approach will have no additional writes 
and have less IO overall. The right relation block hopefully gets in to OS 
cache.
But this approach can result in some map tasks being much slower than others.

Another option is to write the left side join key values that didn't fit into 
memory onto hdfs in separate files, one file for each chunch that is expected 
to fit into memory, and have another round of MR job do merge join on these 
files. ( I think hive has a skew join impl on similar lines). This would 
involve changing the MR plan at runtime.



> Fix merge join to work with many duplicate left keys
> 
>
> Key: PIG-2774
> URL: https://issues.apache.org/jira/browse/PIG-2774
> Project: Pig
>  Issue Type: Bug
>Reporter: Aneesh Sharma
>
> A merge join can throw an OOM error if the number of duplicate left tuples is 
> large as it accumulates all of them in memory. There are two solutions around 
> this problem:
> 1. Serialize the accumulated tuples to disk if they exceed a certain size.
> 2. Spit out join output periodically, and re-seek on the right hand side 
> index.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2774) Fix merge join to work with many duplicate left keys

2012-06-27 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402722#comment-13402722
 ] 

Thejas M Nair commented on PIG-2774:


If the left side relations tuples for a value of join key are serialized to 
disk, then for ever value of join key in right relation, it will hit the disk. 
That will perform very poorly.
Looks like what we need is something like a merge-skew join. Ie, similar to 
skew join,  sample left side, and partition the splits for map tasks based on 
sampled information. 

> Fix merge join to work with many duplicate left keys
> 
>
> Key: PIG-2774
> URL: https://issues.apache.org/jira/browse/PIG-2774
> Project: Pig
>  Issue Type: Bug
>Reporter: Aneesh Sharma
>
> A merge join can throw an OOM error if the number of duplicate left tuples is 
> large as it accumulates all of them in memory. There are two solutions around 
> this problem:
> 1. Serialize the accumulated tuples to disk if they exceed a certain size.
> 2. Spit out join output periodically, and re-seek on the right hand side 
> index.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




  1   2   3   4   5   6   7   >