from:"Thejas M Nair \(JIRA\)"

[jira] [Commented] (PIG-5112) Cleanup pig-template.xml

2017-01-25 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15838166#comment-15838166
 ] 

Thejas M Nair commented on PIG-5112:


+1

> Cleanup pig-template.xml
> 
>
> Key: PIG-5112
> URL: https://issues.apache.org/jira/browse/PIG-5112
> Project: Pig
>  Issue Type: Bug
>  Components: build
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.17.0
>
> Attachments: PIG-5112-1.patch
>
>
> Several entries in pig-template.xml are outdated. Attach a patch to remove or 
> update those entries. Later we shall use ivy:makepom to generate pig.pom and 
> lib dir, I will open a separate ticket for that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PIG-4972) StreamingIO_1 fail on perl 5.22

2016-08-30 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-4972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450030#comment-15450030
 ] 

Thejas M Nair commented on PIG-4972:


+1

> StreamingIO_1 fail on perl 5.22
> ---
>
> Key: PIG-4972
> URL: https://issues.apache.org/jira/browse/PIG-4972
> Project: Pig
>  Issue Type: Bug
>  Components: e2e harness
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.17.0
>
> Attachments: PIG-4972-1.patch
>
>
> Saw StreamingIO_1 on particular perl version due to a warning in 
> PigStreaming.pl. You can see the warning in any version of perl using "perl 
> -w":
> {code}
> defined(%hash) is deprecated at streaming/PigStreaming.pl line 76.
>   (Maybe you should just omit the defined()?)
> {code}
> In some particular version of perl, warning check is mandatory and the perl 
> script just fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PIG-1472) Optimize serialization/deserialization between Map and Reduce and between MR jobs

2015-08-12 Thread Thejas M Nair (JIRA)

[
https://issues.apache.org/jira/browse/PIG-1472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14694026#comment-14694026
]

Thejas M Nair commented on PIG-1472:

I don't remember if I had looked into WritableUtils.writeVInt back then or if
it was available with the pig version being used back then (its been 5 years!
:) )
Would using WritableUtils.writeVInt mean that an extra byte needs to be used
for storing the type ? ie bag vs map vs tuple ..
For complex types, savings are more noticeable for smaller sizes. For a bag of
size 32768, one byte saving won't be significant. However, for an int of size
32768 , the saving of one byte is significant.

Optimize serialization/deserialization between Map and Reduce and between MR
jobs
-

Key: PIG-1472
URL: https://issues.apache.org/jira/browse/PIG-1472
Project: Pig
Issue Type: Improvement
Affects Versions: 0.8.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
Fix For: 0.8.0

Attachments: PIG-1472.2.patch, PIG-1472.3.patch, PIG-1472.4.patch,
PIG-1472.patch

In certain types of pig queries most of the execution time is spent in
serializing/deserializing (sedes) records between Map and Reduce and between
MR jobs.
For example, if PigMix queries are modified to specify types for all the
fields in the load statement schema, some of the queries (L2,L3,L9, L10 in
pigmix v1) that have records with bags and maps being transmitted across map
or reduce boundaries run a lot longer (runtime increase of few times has been
seen.
There are a few optimizations that have shown to improve the performance of
sedes in my tests -
1. Use smaller number of bytes to store length of the column . For example if
a bytearray is smaller than 255 bytes , a byte can be used to store the
length instead of the integer that is currently used.
2. Instead of custom code to do sedes on Strings, use DataOutput.writeUTF and
DataInput.readUTF. This reduces the cost of serialization by more than 1/2.
Zebra and BinStorage are known to use DefaultTuple sedes functionality. The
serialization format that these loaders use cannot change, so after the
optimization their format is going to be different from the format used
between M/R boundaries.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (PIG-4624) pig errors out on ORC empty file without schema

2015-07-06 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-4624:
---
Attachment: PIG-4624.1.patch

The ORC issue should be separately addressed in ORC/Hive, however, it would be 
good if pig can handle this case with already generated files.

Attaching patch from [~daijy].


 pig errors out on ORC empty file without schema
 ---

 Key: PIG-4624
 URL: https://issues.apache.org/jira/browse/PIG-4624
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.15.0
Reporter: Thejas M Nair
Assignee: Daniel Dai
 Fix For: 0.16.0, 0.15.1

 Attachments: PIG-4624.1.patch


 If ORC produces an empty file without schema (which ideally, it is not 
 supposed to), then pig query reading the data gives the following error - 
 org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable schema 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (PIG-4624) pig errors out on ORC empty file without schema

2015-07-06 Thread Thejas M Nair (JIRA)

Thejas M Nair created PIG-4624:
--

 Summary: pig errors out on ORC empty file without schema
 Key: PIG-4624
 URL: https://issues.apache.org/jira/browse/PIG-4624
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.15.0
Reporter: Thejas M Nair
Assignee: Daniel Dai


If ORC produces an empty file without schema (which ideally, it is not supposed 
to), then pig query reading the data gives the following error - 
org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable schema 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (PIG-4624) pig errors out on ORC empty file without schema

2015-07-06 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-4624:
---
Fix Version/s: 0.15.1
   0.16.0

 pig errors out on ORC empty file without schema
 ---

 Key: PIG-4624
 URL: https://issues.apache.org/jira/browse/PIG-4624
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.15.0
Reporter: Thejas M Nair
Assignee: Daniel Dai
 Fix For: 0.16.0, 0.15.1


 If ORC produces an empty file without schema (which ideally, it is not 
 supposed to), then pig query reading the data gives the following error - 
 org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable schema 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (PIG-4624) Error on ORC empty file without schema

2015-07-06 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-4624:
---
Attachment: PIG-4624.1.patch

 Error on ORC empty file without schema
 --

 Key: PIG-4624
 URL: https://issues.apache.org/jira/browse/PIG-4624
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.15.0
Reporter: Thejas M Nair
Assignee: Daniel Dai
 Fix For: 0.16.0, 0.15.1

 Attachments: PIG-4624.1.patch


 If ORC produces an empty file without schema (which ideally, it is not 
 supposed to), then pig query reading the data gives the following error - 
 org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable schema 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (PIG-4624) Error on ORC empty file without schema

2015-07-06 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-4624:
---
Attachment: (was: PIG-4624.1.patch)

 Error on ORC empty file without schema
 --

 Key: PIG-4624
 URL: https://issues.apache.org/jira/browse/PIG-4624
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.15.0
Reporter: Thejas M Nair
Assignee: Daniel Dai
 Fix For: 0.16.0, 0.15.1

 Attachments: PIG-4624.1.patch


 If ORC produces an empty file without schema (which ideally, it is not 
 supposed to), then pig query reading the data gives the following error - 
 org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable schema 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (PIG-4624) Error on ORC empty file without schema

2015-07-06 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-4624:
---
Summary: Error on ORC empty file without schema  (was: pig errors out on 
ORC empty file without schema)

 Error on ORC empty file without schema
 --

 Key: PIG-4624
 URL: https://issues.apache.org/jira/browse/PIG-4624
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.15.0
Reporter: Thejas M Nair
Assignee: Daniel Dai
 Fix For: 0.16.0, 0.15.1

 Attachments: PIG-4624.1.patch


 If ORC produces an empty file without schema (which ideally, it is not 
 supposed to), then pig query reading the data gives the following error - 
 org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable schema 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (PIG-4624) Error on ORC empty file without schema

2015-07-06 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-4624:
---
Attachment: (was: PIG-4624.1.patch)

 Error on ORC empty file without schema
 --

 Key: PIG-4624
 URL: https://issues.apache.org/jira/browse/PIG-4624
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.15.0
Reporter: Thejas M Nair
Assignee: Daniel Dai
 Fix For: 0.16.0, 0.15.1

 Attachments: PIG-4624.1.patch


 If ORC produces an empty file without schema (which ideally, it is not 
 supposed to), then pig query reading the data gives the following error - 
 org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable schema 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (PIG-4624) Error on ORC empty file without schema

2015-07-06 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-4624:
---
Attachment: PIG-4624.1.patch

 Error on ORC empty file without schema
 --

 Key: PIG-4624
 URL: https://issues.apache.org/jira/browse/PIG-4624
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.15.0
Reporter: Thejas M Nair
Assignee: Daniel Dai
 Fix For: 0.16.0, 0.15.1

 Attachments: PIG-4624.1.patch


 If ORC produces an empty file without schema (which ideally, it is not 
 supposed to), then pig query reading the data gives the following error - 
 org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable schema 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PIG-4556) Local mode is broken in some case by PIG-4247

2015-05-21 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-4556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555165#comment-14555165
 ] 

Thejas M Nair commented on PIG-4556:


I believe s3 is used in non-local modes as well.
Change looks good to me. (for my reference, since it took me few mins to figure 
out, the real change is in order in which params are passed to  
ConfigurationUtil.mergeConf).


 Local mode is broken in some case by PIG-4247
 -

 Key: PIG-4556
 URL: https://issues.apache.org/jira/browse/PIG-4556
 Project: Pig
  Issue Type: Bug
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.15.0

 Attachments: PIG-4556-1.patch, PIG-4556-2.patch


 HExecutionEngine.getS3Conf is wrong. It should only return s3 config. 
 Currently it will return all the properties, including *-site.xml even in 
 local mode. In one particular case, mapred-site.xml contains 
 mapreduce.application.framework.path, this will going to the local mode 
 config, thus we see the exception:
 {code}
 Message: java.io.FileNotFoundException: File 
 file:/hdp/apps/2.2.0.0-2041/mapreduce/mapreduce.tar.gz does not exist
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524)
   at 
 org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:111)
   at 
 org.apache.hadoop.fs.AbstractFileSystem.resolvePath(AbstractFileSystem.java:460)
   at org.apache.hadoop.fs.FilterFs.resolvePath(FilterFs.java:157)
   at org.apache.hadoop.fs.FileContext$24.next(FileContext.java:2137)
   at org.apache.hadoop.fs.FileContext$24.next(FileContext.java:2133)
   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
   at org.apache.hadoop.fs.FileContext.resolve(FileContext.java:2133)
   at org.apache.hadoop.fs.FileContext.resolvePath(FileContext.java:595)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.addMRFrameworkToDistributedCache(JobSubmitter.java:753)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:435)
   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
   at 
 org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
   at 
 org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:194)
   at java.lang.Thread.run(Thread.java:745)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:276)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (PIG-4514) pig trunk compilation is broken - VertexManagerPluginContext.reconfigureVertex change

2015-04-22 Thread Thejas M Nair (JIRA)

Thejas M Nair created PIG-4514:
--

 Summary: pig trunk compilation is broken - 
VertexManagerPluginContext.reconfigureVertex change
 Key: PIG-4514
 URL: https://issues.apache.org/jira/browse/PIG-4514
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.15.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.15.0



{code}
src/org/apache/pig/backend/hadoop/executionengine/tez/runtime/PigGraceShuffleVertexManager.java:173:
 error: exception TezException is never thrown in body of corresponding try 
statement
[javac] } catch (TezException e) {
[javac]   ^
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (PIG-4514) pig trunk compilation is broken - VertexManagerPluginContext.reconfigureVertex change

2015-04-22 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-4514:
---
Attachment: PIG-4514.1.patch

 pig trunk compilation is broken - 
 VertexManagerPluginContext.reconfigureVertex change
 -

 Key: PIG-4514
 URL: https://issues.apache.org/jira/browse/PIG-4514
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.15.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.15.0

 Attachments: PIG-4514.1.patch


 {code}
 src/org/apache/pig/backend/hadoop/executionengine/tez/runtime/PigGraceShuffleVertexManager.java:173:
  error: exception TezException is never thrown in body of corresponding try 
 statement
 [javac] } catch (TezException e) {
 [javac]   ^
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PIG-4509) [Pig on Tez] Unassigned applications not killed on shutdown

2015-04-16 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-4509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498325#comment-14498325
 ] 

Thejas M Nair commented on PIG-4509:


[~rohini] This results in a compilation failure. 

{code}
src/org/apache/pig/backend/hadoop/executionengine/tez/TezSessionManager.java:105:
 error: unreported exception Throwable; must be caught or declared to be thrown
[javac] throw e;
[javac] ^
{code}

 [Pig on Tez] Unassigned applications not killed on shutdown
 ---

 Key: PIG-4509
 URL: https://issues.apache.org/jira/browse/PIG-4509
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Rohini Palaniswamy
Assignee: Rohini Palaniswamy
 Fix For: 0.15.0

 Attachments: PIG-4509-1.patch


  tezclient.stop() should be called when tezClient.waitTillReady() is 
 interrupted on shutdown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PIG-4509) [Pig on Tez] Unassigned applications not killed on shutdown

2015-04-16 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-4509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498812#comment-14498812
 ] 

Thejas M Nair commented on PIG-4509:


+1
The change looks good to me.
Thanks Rohini!



 [Pig on Tez] Unassigned applications not killed on shutdown
 ---

 Key: PIG-4509
 URL: https://issues.apache.org/jira/browse/PIG-4509
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Rohini Palaniswamy
Assignee: Rohini Palaniswamy
 Fix For: 0.15.0

 Attachments: PIG-4509-1.patch, PIG-4509-FixCompileError.patch


  tezclient.stop() should be called when tezClient.waitTillReady() is 
 interrupted on shutdown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PIG-4509) [Pig on Tez] Unassigned applications not killed on shutdown

2015-04-16 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-4509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498787#comment-14498787
 ] 

Thejas M Nair commented on PIG-4509:


It builds fine on my mac as well with jdk 7. However, it is failing with jdk7 
in our internal build environment as well (probably linux).

The fact that it passes in some setups is certainly very strange. I think we 
should still go ahead and fix this, as far as i know this should result in a 
syntax error.


 [Pig on Tez] Unassigned applications not killed on shutdown
 ---

 Key: PIG-4509
 URL: https://issues.apache.org/jira/browse/PIG-4509
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Rohini Palaniswamy
Assignee: Rohini Palaniswamy
 Fix For: 0.15.0

 Attachments: PIG-4509-1.patch


  tezclient.stop() should be called when tezClient.waitTillReady() is 
 interrupted on shutdown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (PIG-4486) set Tez ACLs appropriately in hive

2015-03-30 Thread Thejas M Nair (JIRA)

Thejas M Nair created PIG-4486:
--

 Summary: set Tez ACLs appropriately in hive
 Key: PIG-4486
 URL: https://issues.apache.org/jira/browse/PIG-4486
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair


Hive should make the necessary changes to integrate with Tez and Timeline. It 
should pass the necessary ACL related params to ensure that query execution + 
logs is only visible to the relevant users.

Proposed Changes -
Set session level tez ACL for a super user, to allow modify + view
Set DAG level ACL for user running the query (the end user), to allow modify + 
view
Determining the super user -
Super user can be configured using using hive.tez.admin.user. This can be 
initialized by Authorization implementation (such as sql standard 
authorization) if it is not already set to a specific value. SQL standard 
authorization would initialize if it is unset to the sql standard admin user.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (PIG-4331) update README, '-x' option in usage to include tez

2014-11-14 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-4331:
---
Description: Pig queries can be run using tez, by specifying pig -x tez. 
The output of pig --help needs to be updated to indicate that.  (was: Pig 
queries can be run using tez, by specifying pig -x tez. But usage does not 
indicate this.
)

 update README, '-x' option in usage to include tez
 --

 Key: PIG-4331
 URL: https://issues.apache.org/jira/browse/PIG-4331
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.15.0


 Pig queries can be run using tez, by specifying pig -x tez. The output of 
 pig --help needs to be updated to indicate that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (PIG-4331) update README, '-x' option in usage to include tez

2014-11-14 Thread Thejas M Nair (JIRA)

Thejas M Nair created PIG-4331:
--

 Summary: update README, '-x' option in usage to include tez
 Key: PIG-4331
 URL: https://issues.apache.org/jira/browse/PIG-4331
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.15.0


Pig queries can be run using tez, by specifying pig -x tez. But usage does 
not indicate this.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (PIG-4331) update README, '-x' option in usage to include tez

2014-11-14 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-4331:
---
Attachment: PIG-4331.1.patch

 update README, '-x' option in usage to include tez
 --

 Key: PIG-4331
 URL: https://issues.apache.org/jira/browse/PIG-4331
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.15.0

 Attachments: PIG-4331.1.patch


 Pig queries can be run using tez, by specifying pig -x tez. The output of 
 pig --help needs to be updated to indicate that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (PIG-4331) update README, '-x' option in usage to include tez

2014-11-14 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-4331:
---
Status: Patch Available  (was: Open)

 update README, '-x' option in usage to include tez
 --

 Key: PIG-4331
 URL: https://issues.apache.org/jira/browse/PIG-4331
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.15.0

 Attachments: PIG-4331.1.patch


 Pig queries can be run using tez, by specifying pig -x tez. The output of 
 pig --help needs to be updated to indicate that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PIG-4328) Upgrade Hive to 0.14

2014-11-12 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-4328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209090#comment-14209090
 ] 

Thejas M Nair commented on PIG-4328:


+1

 Upgrade Hive to 0.14
 

 Key: PIG-4328
 URL: https://issues.apache.org/jira/browse/PIG-4328
 Project: Pig
  Issue Type: Bug
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.14.0

 Attachments: PIG-4328-1.patch


 Hive 0.14.0 artifacts are available. We shall switch to use the released 
 version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PIG-4160) -forcelocaljars / -j flag when using a remote url for a script

2014-10-30 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14190713#comment-14190713
 ] 

Thejas M Nair commented on PIG-4160:


Daniel, can you also include the doc edits ? 
Also the commented line #HADOOP_OPTS can be deleted.


 -forcelocaljars / -j flag when using a remote url for a script
 --

 Key: PIG-4160
 URL: https://issues.apache.org/jira/browse/PIG-4160
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.14.0
Reporter: Andrew C. Oliver
  Labels: patch
 Fix For: 0.14.0

 Attachments: PIG-4160-2.patch, forcelocal.trunk.patch, 
 forcelocal.withtests.patch

   Original Estimate: 3h
  Remaining Estimate: 3h

 patch adds a -j/forcelocaljars flag which if enabled allows you to do 
 pig -j -useHCatalog hdfs://myserver:8020/load/scripts/mydir/myscript.pig
 thus loading the pig script REMOTELY 
 while loading the jar files LOCALLY
 One does this to avoid a single point of failure but avoid one central 
 interversion dependent repository for all the jars across all teams/projects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PIG-4160) -forcelocaljars / -j flag when using a remote url for a script

2014-10-30 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14190819#comment-14190819
 ] 

Thejas M Nair commented on PIG-4160:


I think it would be useful to retain mention of the old additional.jars 
parameter in the web docs, Saying that it is similar to .jars.comma but 
separated by the OS path charactor and does not allow for scheme to be 
specified (and that the use of jars.comma is preferred)


 -forcelocaljars / -j flag when using a remote url for a script
 --

 Key: PIG-4160
 URL: https://issues.apache.org/jira/browse/PIG-4160
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.14.0
Reporter: Andrew C. Oliver
  Labels: patch
 Fix For: 0.14.0

 Attachments: PIG-4160-2.patch, PIG-4160-3.patch, 
 forcelocal.trunk.patch, forcelocal.withtests.patch

   Original Estimate: 3h
  Remaining Estimate: 3h

 patch adds a -j/forcelocaljars flag which if enabled allows you to do 
 pig -j -useHCatalog hdfs://myserver:8020/load/scripts/mydir/myscript.pig
 thus loading the pig script REMOTELY 
 while loading the jar files LOCALLY
 One does this to avoid a single point of failure but avoid one central 
 interversion dependent repository for all the jars across all teams/projects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PIG-4160) -forcelocaljars / -j flag when using a remote url for a script

2014-10-30 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14190821#comment-14190821
 ] 

Thejas M Nair commented on PIG-4160:


I think we should also change the jira title and description to indicate what 
is being implemented.


 -forcelocaljars / -j flag when using a remote url for a script
 --

 Key: PIG-4160
 URL: https://issues.apache.org/jira/browse/PIG-4160
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.14.0
Reporter: Andrew C. Oliver
  Labels: patch
 Fix For: 0.14.0

 Attachments: PIG-4160-2.patch, PIG-4160-3.patch, 
 forcelocal.trunk.patch, forcelocal.withtests.patch

   Original Estimate: 3h
  Remaining Estimate: 3h

 patch adds a -j/forcelocaljars flag which if enabled allows you to do 
 pig -j -useHCatalog hdfs://myserver:8020/load/scripts/mydir/myscript.pig
 thus loading the pig script REMOTELY 
 while loading the jar files LOCALLY
 One does this to avoid a single point of failure but avoid one central 
 interversion dependent repository for all the jars across all teams/projects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PIG-4160) Provide a way to pass local jars in pig.additional.jars when using a remote url for a script

2014-10-30 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14190848#comment-14190848
 ] 

Thejas M Nair commented on PIG-4160:


+1

 Provide a way to pass local jars in pig.additional.jars when using a remote 
 url for a script
 

 Key: PIG-4160
 URL: https://issues.apache.org/jira/browse/PIG-4160
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.14.0
Reporter: Andrew C. Oliver
  Labels: patch
 Fix For: 0.14.0

 Attachments: PIG-4160-2.patch, PIG-4160-3.patch, PIG-4160-4.patch, 
 forcelocal.trunk.patch, forcelocal.withtests.patch

   Original Estimate: 3h
  Remaining Estimate: 3h

 patch adds a -j/forcelocaljars flag which if enabled allows you to do 
 pig -j -useHCatalog hdfs://myserver:8020/load/scripts/mydir/myscript.pig
 thus loading the pig script REMOTELY 
 while loading the jar files LOCALLY
 One does this to avoid a single point of failure but avoid one central 
 interversion dependent repository for all the jars across all teams/projects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PIG-4250) Fix Security Risks found by Coverity

2014-10-30 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191114#comment-14191114
 ] 

Thejas M Nair commented on PIG-4250:


blocks such as following aren't necessary - 
{code}
   } catch (IOException e) {
throw e
} 
{code}

You can use Hadoop IOUtils.closeStream or cleanup to call close. You don't need 
the addtional try-catch and null check with that.


 Fix Security Risks found by Coverity
 

 Key: PIG-4250
 URL: https://issues.apache.org/jira/browse/PIG-4250
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.14.0

 Attachments: PIG-4250-1.patch, PIG-4250-2.patch


 Here is the report: https://scan.coverity.com/projects/3026 (Need to register 
 to see). Most belong to one pattern: not close stream when exception happens.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PIG-4151) Pig Cannot Write Empty Maps to HBase

2014-10-14 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-4151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171925#comment-14171925
 ] 

Thejas M Nair commented on PIG-4151:


+1

 Pig Cannot Write Empty Maps to HBase
 

 Key: PIG-4151
 URL: https://issues.apache.org/jira/browse/PIG-4151
 Project: Pig
  Issue Type: Bug
  Components: internal-udfs
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.14.0

 Attachments: PIG-4151-1.patch


 Pig is unable to write empty maps to HBase. Instruction for reproduce:
 input file pig_data_bad.txt:
 {code}
 row1;Homer;Morrison;[1#Silvia,2#Stacy]
 row2;Sheila;Fletcher;[1#Becky,2#Salvador,3#Lois]
 row4;Andre;Morton;[1#Nancy]
 row3;Sonja;Webb;[]
 {code}
 Create table in hbase:
 create 'test', 'info', 'friends'
 Pig script:
 {code}
 source = LOAD '/pig_data_bad.txt' USING PigStorage(';') AS (row:chararray, 
 first_name:chararray, last_name:chararray, friends:map[]);
 STORE source INTO 'hbase://test' USING 
 org.apache.pig.backend.hadoop.hbase.HBaseStorage('info:fname info:lname 
 friends:*');
 {code}
 Stack:
 java.lang.NullPointerException
 at 
 org.apache.pig.backend.hadoop.hbase.HBaseStorage.putNext(HBaseStorage.java:880)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
 at 
 org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:635)
 at 
 org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
 at 
 org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:284)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PIG-4128) New logical optimizer rule: ConstantCalculator

2014-08-26 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14111505#comment-14111505
 ] 

Thejas M Nair commented on PIG-4128:


+1

 New logical optimizer rule: ConstantCalculator
 --

 Key: PIG-4128
 URL: https://issues.apache.org/jira/browse/PIG-4128
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.14.0

 Attachments: PIG-4128-1.patch, PIG-4128-2.patch, PIG-4128-3.patch


 Pig used to have a LogicExpressionSimplifier to simplify expression which 
 also calculates constant expression. The optimizer rule is buggy and we 
 disable it by default in PIG-2316.
 However, we do need this feature especially in partition/predicate push down, 
 since both does not deal with complex constant expression, we'd like to 
 replace the expression with constant before the actual push down. Yes, user 
 may manually do the calculation and rewrite the query, but even rewrite is 
 sometimes not possible. Consider the case user want to push a datetime 
 predicate, user have to write a ToDate udf since Pig does not have datetime 
 constant.
 In this Jira, I provide a new rule: ConstantCalculator, which is much simpler 
 and much less error prone, to replace LogicExpressionSimplifier.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (PIG-3522) Remove shock from pig

2013-11-01 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811774#comment-13811774
 ] 

Thejas M Nair commented on PIG-3522:


+1

 Remove shock from pig
 -

 Key: PIG-3522
 URL: https://issues.apache.org/jira/browse/PIG-3522
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.13.0

 Attachments: PIG-3522-1.patch


 It is only used in very ancient Hadoop which uses HOD as resource manager. 
 Current Pig code does not use it. This include the entire lib-src/shock 
 directory and jsch.jar



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (PIG-3503) More document for Pig 0.12 new features

2013-10-05 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-3503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787319#comment-13787319
 ] 

Thejas M Nair commented on PIG-3503:


Everything else looks good. You can commit after the changes.


 More document for Pig 0.12 new features
 ---

 Key: PIG-3503
 URL: https://issues.apache.org/jira/browse/PIG-3503
 Project: Pig
  Issue Type: Improvement
  Components: documentation
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.12.0

 Attachments: PIG-3503-1.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (PIG-3503) More document for Pig 0.12 new features

2013-10-05 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-3503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787318#comment-13787318
 ] 

Thejas M Nair commented on PIG-3503:


If use set command without providing key/value pair, Pig print all the 
configurations and all system properties.  can be changed to 
If set command is used without key/value pair argument, Pig prints all the 
configurations and system properties.

In perf.xml
In the example, should we use a load function that supports partition filter 
pushdown ? Otherwise, people might expect it to work with PigStorage.
Also, should the example in it without the filter statement be removed ? 
{code}
+source
+A = LOAD 'input' as (dt, state, event);
+/source
{code}


 More document for Pig 0.12 new features
 ---

 Key: PIG-3503
 URL: https://issues.apache.org/jira/browse/PIG-3503
 Project: Pig
  Issue Type: Improvement
  Components: documentation
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.12.0

 Attachments: PIG-3503-1.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (PIG-3360) Some intermittent negative e2e tests fail on hadoop 2

2013-09-24 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776935#comment-13776935
 ] 

Thejas M Nair commented on PIG-3360:


Looks good. +1


 Some intermittent negative e2e tests fail on hadoop 2
 -

 Key: PIG-3360
 URL: https://issues.apache.org/jira/browse/PIG-3360
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.12.0

 Attachments: PIG-3360-1.patch, PIG-3360-2.patch


 One example is StreamingErrors_2. Here is the stack we get:
 Backend error message
 -
 Error: org.apache.pig.backend.executionengine.ExecException: ERROR 2055: 
 Received Error while processing the map plan: 'perl PigStreamingBad.pl middle 
 ' failed with exit status: 2
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:311)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1477)
   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
 Pig Stack Trace
 ---
 ERROR 2244: Job failed, hadoop does not return any error message
 org.apache.pig.backend.executionengine.ExecException: ERROR 2244: Job failed, 
 hadoop does not return any error message
   at 
 org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:145)
   at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
   at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
   at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
   at org.apache.pig.Main.run(Main.java:604)
   at org.apache.pig.Main.main(Main.java:157)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-3259) Optimize byte to Long/Integer conversions

2013-03-26 Thread Thejas M Nair (JIRA)

[
https://issues.apache.org/jira/browse/PIG-3259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614137#comment-13614137
]

Thejas M Nair commented on PIG-3259:

bq. How do we determine the number of non-numbers without making calls to
sanityCheck..()?
By counting the number of times exception has so far been thrown by .valueOf().
Once a threshold has been crossed, we can introduce the sanity check for each
new value. This will put a limit on worst ('incorrect') case performance
without degrading the 'correct' case performance by much.

I wonder if there are good libraries that we can use for the sanity checks, as
the decimal check seems bit more complicated .

Optimize byte to Long/Integer conversions
-

Key: PIG-3259
URL: https://issues.apache.org/jira/browse/PIG-3259
Project: Pig
Issue Type: Bug
Affects Versions: 0.11, 0.11.1
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
Fix For: 0.12

Attachments: byteToLong.xlsx

These conversions can be performing better. If the input is not numeric
(1234abcd) the code calls Double.valueOf(String) regardless before finally
returning null. Any script that inadvertently (user's mistake or not) tries
to cast non-numeric column to int or long would result in many wasteful
calls.
We can avoid this and only handle the cases we find the input to be a decimal
number (1234.56) and return null otherwise even before trying
Double.valueOf(String).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-3259) Optimize byte to Long/Integer conversions

2013-03-25 Thread Thejas M Nair (JIRA)

[
https://issues.apache.org/jira/browse/PIG-3259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13613354#comment-13613354
]

Thejas M Nair commented on PIG-3259:

Sounds like a good idea.
The check you have here does not accept all valid double string representations
(See
http://docs.oracle.com/javase/6/docs/api/java/lang/Double.html#valueOf(java.lang.String)
) . (eg with exponent, or hexadecimal representation starting with 0x).

But if we can avoid the performance degradation for the 'correct' [1] case
(which seems to be be in range of 2-8% in the micro benchmark that ran for at
least few seconds), that would be better. One way to avoid performance
degradation for 'correct' case would be to start by doing .valueOf() without
checks, then use the number of non-numbers encountered to decide if want to be
making the sanityCheckIntegerLongDecimal() calls.

[1] - by correct I mean the case where the field declared an integer or a
double has correct representation.

Optimize byte to Long/Integer conversions
-

Key: PIG-3259
URL: https://issues.apache.org/jira/browse/PIG-3259
Project: Pig
Issue Type: Bug
Affects Versions: 0.11, 0.11.1
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
Fix For: 0.12

Attachments: byteToLong.xlsx

[jira] [Commented] (PIG-3248) Upgrade hadoop-2.0.0-alpha to hadoop-2.0.3-alpha

2013-03-18 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605460#comment-13605460
 ] 

Thejas M Nair commented on PIG-3248:


Daniel, Is the following change relevant for the hadoop 2 version upgrade ?

{code}
===
--- test/org/apache/pig/test/TestPigSplit.java  (revision 1456106)
+++ test/org/apache/pig/test/TestPigSplit.java  (working copy)
@@ -108,7 +108,7 @@
 createInput(new String[] { 0\ta });
 
 pigServer.registerQuery(a = load ' + inputFileName + ';);
-for (int i = 0; i  500; i++) {
+for (int i = 0; i  200; i++) {
 pigServer.registerQuery(a = filter a by $0 == '1';);
 }
 IteratorTuple iter = pigServer.openIterator(a);
{code}

 Upgrade hadoop-2.0.0-alpha to hadoop-2.0.3-alpha
 

 Key: PIG-3248
 URL: https://issues.apache.org/jira/browse/PIG-3248
 Project: Pig
  Issue Type: Improvement
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.12

 Attachments: PIG-3248-1.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-3248) Upgrade hadoop-2.0.0-alpha to hadoop-2.0.3-alpha

2013-03-18 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605584#comment-13605584
 ] 

Thejas M Nair commented on PIG-3248:


bq. This is actually a different issue. TestPigSplit always fail on my test 
machine due to stack overflow. It is Ok if you want me open a separate Jira for 
this issue.
As it is a very minor change to a test case, i think its fine to include it 
here. 200 is large enough for number of statements, so i think so many levels 
instead of 500 should be ok, if it is causing issues with some jvm configs.



 Upgrade hadoop-2.0.0-alpha to hadoop-2.0.3-alpha
 

 Key: PIG-3248
 URL: https://issues.apache.org/jira/browse/PIG-3248
 Project: Pig
  Issue Type: Improvement
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.12

 Attachments: PIG-3248-1.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-3248) Upgrade hadoop-2.0.0-alpha to hadoop-2.0.3-alpha

2013-03-18 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605615#comment-13605615
 ] 

Thejas M Nair commented on PIG-3248:


+1

 Upgrade hadoop-2.0.0-alpha to hadoop-2.0.3-alpha
 

 Key: PIG-3248
 URL: https://issues.apache.org/jira/browse/PIG-3248
 Project: Pig
  Issue Type: Improvement
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.12

 Attachments: PIG-3248-1.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-3089) Implicit relation names

2012-12-11 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529125#comment-13529125
 ] 

Thejas M Nair commented on PIG-3089:


In my opinion, too many rules for implicit relation names would make pig 
scripts (written by others) hard to read, specially for people who are new to 
pig. I think it is better to just allow name of preceding relation to be 
referred using a special notation. 

 Implicit relation names
 ---

 Key: PIG-3089
 URL: https://issues.apache.org/jira/browse/PIG-3089
 Project: Pig
  Issue Type: New Feature
  Components: grunt, parser
Reporter: Russell Jurney
Assignee: Jonathan Coveney

 A = load foo;
 B = load bar;
 filter A by id  5;
 join A_1 by id, B by id;
 // or A_filter
 foreach A_1_B generate id;
 store into foobar; // A_1_B_1 or A_filter_B_generate
 Or some such routine?
 We don't have to be explicit no more!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-3044) Trigger POPartialAgg compaction under GC pressure

2012-12-07 Thread Thejas M Nair (JIRA)

[
https://issues.apache.org/jira/browse/PIG-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13526600#comment-13526600
]

Thejas M Nair commented on PIG-3044:

bq. I would even say we remove the % memory budget as the Spillable mechanism
is more reliable and much simpler.
The reason why % memory budget was introduced for SelfSpillBag, was because the
spillable mechanism didn't always work well. The cleanup often was getting
triggered too late. So I think it is better use the Spillable mechanism here to
spill earlier if necessary, as the patch is doing.

Trigger POPartialAgg compaction under GC pressure
-

Key: PIG-3044
URL: https://issues.apache.org/jira/browse/PIG-3044
Project: Pig
Issue Type: Improvement
Affects Versions: 0.10.0, 0.11, 0.10.1
Reporter: Dmitriy V. Ryaboy
Assignee: Dmitriy V. Ryaboy
Fix For: 0.11, 0.12

Attachments: PIG-3044.2.diff, PIG-3404.diff

If partial aggregation is turned on in pig 10 and 11, 20% (by default) of the
available heap can be consumed by the POPartialAgg operator. This can cause
memory issues for jobs that use all, or nearly all, of the heap already.
If we make POPartialAgg spillable (trigger compaction when memory reduction
is required), we would be much nicer to high-memory jobs.

[jira] [Updated] (PIG-3071) update hcatalog jar and path to hbase storage handler har

2012-11-28 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-3071:
---

Labels: hcatalog  (was: )

 update hcatalog jar and path to hbase storage handler har
 -

 Key: PIG-3071
 URL: https://issues.apache.org/jira/browse/PIG-3071
 Project: Pig
  Issue Type: Bug
Reporter: Arpit Gupta
  Labels: hcatalog
 Attachments: PIG-3071.patch


 Due to changes in hcatalog 0.5 packaging we need to update the hcatalog jar 
 name and the path to the hbase storage handler jar.
 pig script should be updated to work with either version.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2176) add logical plan assumption checker

2012-11-12 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-2176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-2176:
---

Assignee: Thejas M Nair

 add logical plan assumption checker 
 

 Key: PIG-2176
 URL: https://issues.apache.org/jira/browse/PIG-2176
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.9.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.10.0

 Attachments: PIG-2176.1.patch, PIG-2176.2.patch


 Pig expects certain things about LogicalPlan, and optimizer logic depends on 
 those to be true. Could that verifies that these assumptions are true will 
 help in catching issues early on. 
 Some of the assumptions that should be checked - 
 1. All schema have valid uid . (not -1).
 2. All fields in schema have distinct uid. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2959) Add a pig.cmd for Pig to run under Windows

2012-11-11 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13495092#comment-13495092
 ] 

Thejas M Nair commented on PIG-2959:


+1

 Add a pig.cmd for Pig to run under Windows
 --

 Key: PIG-2959
 URL: https://issues.apache.org/jira/browse/PIG-2959
 Project: Pig
  Issue Type: Sub-task
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.11

 Attachments: pig.cmd




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2980) documentation for DateTime datatype

2012-11-09 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494274#comment-13494274
 ] 

Thejas M Nair commented on PIG-2980:


bq. Yes, I mean ToDate('1970-01-01T00:00:00.000+00:00'). where users can 
specify a constant string to create a datetime object. Let me rephrase the 
description here.
I think we can just remove datestamp from the constants table and add a note 
under the table, that users should use ToDate udf to generate datetime from 
string constants. 
 

 documentation for DateTime datatype
 ---

 Key: PIG-2980
 URL: https://issues.apache.org/jira/browse/PIG-2980
 Project: Pig
  Issue Type: Bug
  Components: documentation
Reporter: Thejas M Nair
Assignee: Zhijie Shen
 Fix For: 0.11

 Attachments: PIG-2980.patch


 Documentation for new DateTime type needs to be added.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2434) investigate 5% slowdown in TPC-H Q6 query in 0.10

2012-10-30 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-2434:
---

Fix Version/s: (was: 0.11)

 investigate 5% slowdown in TPC-H Q6 query in 0.10
 -

 Key: PIG-2434
 URL: https://issues.apache.org/jira/browse/PIG-2434
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Thejas M Nair

 0.10 is slower than 0.9 by around 5% for TPC-H Q6 query as per observation in 
 https://issues.apache.org/jira/browse/PIG-2228?focusedCommentId=13171461page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13171461
  .
 This needs to be investigated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2434) investigate 5% slowdown in TPC-H Q6 query in 0.10

2012-10-30 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486693#comment-13486693
 ] 

Thejas M Nair commented on PIG-2434:


Unlinking from 0.11 .

 investigate 5% slowdown in TPC-H Q6 query in 0.10
 -

 Key: PIG-2434
 URL: https://issues.apache.org/jira/browse/PIG-2434
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Thejas M Nair

 0.10 is slower than 0.9 by around 5% for TPC-H Q6 query as per observation in 
 https://issues.apache.org/jira/browse/PIG-2228?focusedCommentId=13171461page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13171461
  .
 This needs to be investigated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2981) add e2e tests for DateTime data type

2012-10-30 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-2981:
---

Fix Version/s: (was: 0.11)
   0.12

 add e2e tests for DateTime  data type
 -

 Key: PIG-2981
 URL: https://issues.apache.org/jira/browse/PIG-2981
 Project: Pig
  Issue Type: Test
Reporter: Thejas M Nair
 Fix For: 0.12


 e2e tests for DateTime datatype need to be added.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2981) add e2e tests for DateTime data type

2012-10-30 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486695#comment-13486695
 ] 

Thejas M Nair commented on PIG-2981:


I will try to do it this week, and get it into 0.11 . But for now putting a fix 
version of 0.12 , just in case.

 add e2e tests for DateTime  data type
 -

 Key: PIG-2981
 URL: https://issues.apache.org/jira/browse/PIG-2981
 Project: Pig
  Issue Type: Test
Reporter: Thejas M Nair
 Fix For: 0.12


 e2e tests for DateTime datatype need to be added.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (PIG-3007) support group-by collected for load funcs that don't implement CollectableLoadFunc

2012-10-25 Thread Thejas M Nair (JIRA)

Thejas M Nair created PIG-3007:
--

 Summary: support group-by collected for load funcs that don't 
implement CollectableLoadFunc
 Key: PIG-3007
 URL: https://issues.apache.org/jira/browse/PIG-3007
 Project: Pig
  Issue Type: New Feature
Reporter: Thejas M Nair


group-by collected should be supported for all input that are sorted on 
group-by keys.
To ensure that a map task gets all records for a group-key, indexing can be 
done to determine which key at which it should start processing , and if it 
should read from next split as well to get remaining records for the last 
group-by column in its original split.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2982) add unit tests for DateTime type that test setting timezone

2012-10-22 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-2982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482047#comment-13482047
 ] 

Thejas M Nair commented on PIG-2982:


+1. Will commit after running tests.


 add unit tests for DateTime type that test setting timezone
 ---

 Key: PIG-2982
 URL: https://issues.apache.org/jira/browse/PIG-2982
 Project: Pig
  Issue Type: Test
Reporter: Thejas M Nair
Assignee: Zhijie Shen
 Fix For: 0.11

 Attachments: PIG-2982.patch


 The default timezone can be set for the new DateTime type. We need to add 
 unit tests that test this functionality. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (PIG-2980) documentation for DateTime datatype

2012-10-15 Thread Thejas M Nair (JIRA)

Thejas M Nair created PIG-2980:
--

 Summary: documentation for DateTime datatype
 Key: PIG-2980
 URL: https://issues.apache.org/jira/browse/PIG-2980
 Project: Pig
  Issue Type: Bug
  Components: documentation
Reporter: Thejas M Nair
 Fix For: 0.11


Documentation for new DateTime type needs to be added.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (PIG-2981) add e2e tests for DateTime data type

2012-10-15 Thread Thejas M Nair (JIRA)

Thejas M Nair created PIG-2981:
--

 Summary: add e2e tests for DateTime  data type
 Key: PIG-2981
 URL: https://issues.apache.org/jira/browse/PIG-2981
 Project: Pig
  Issue Type: Test
Reporter: Thejas M Nair
 Fix For: 0.11


e2e tests for DateTime datatype need to be added.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (PIG-2982) add unit tests for DateTime type that test setting timezone

2012-10-15 Thread Thejas M Nair (JIRA)

Thejas M Nair created PIG-2982:
--

 Summary: add unit tests for DateTime type that test setting 
timezone
 Key: PIG-2982
 URL: https://issues.apache.org/jira/browse/PIG-2982
 Project: Pig
  Issue Type: Test
Reporter: Thejas M Nair
 Fix For: 0.11


The default timezone can be set for the new DateTime type. We need to add unit 
tests that test this functionality. 


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (PIG-1314) Add DateTime Support to Pig

2012-10-15 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair resolved PIG-1314.


Resolution: Fixed

As Dmitriy suggested, closing this jira and opened new ones for remaining work 
- PIG-2980, PIG-2981, PIG-2982 .


 Add DateTime Support to Pig
 ---

 Key: PIG-1314
 URL: https://issues.apache.org/jira/browse/PIG-1314
 Project: Pig
  Issue Type: Bug
  Components: data
Affects Versions: 0.7.0
Reporter: Russell Jurney
Assignee: Zhijie Shen
  Labels: gsoc2012
 Attachments: joda_vs_builtin.zip, PIG-1314-1.patch, PIG-1314-2.patch, 
 PIG-1314-3.patch, PIG-1314-4.patch, PIG-1314-5.patch, PIG-1314-6.patch, 
 PIG-1314-7.patch

   Original Estimate: 672h
  Remaining Estimate: 672h

 Hadoop/Pig are primarily used to parse log data, and most logs have a 
 timestamp component.  Therefore Pig should support dates as a primitive.
 Can someone familiar with adding types to pig comment on how hard this is?  
 We're looking at doing this, rather than use UDFs.  Is this a patch that 
 would be accepted?
 This is a candidate project for Google summer of code 2012. More information 
 about the program can be found at 
 https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2910) Make toString() methods on Schema and FieldSchema be readable by Utils.getSchemaFromString()

2012-10-12 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-2910:
---

Assignee: Eli Reisman  (was: Thejas M Nair)

 Make toString() methods on Schema and FieldSchema be readable by 
 Utils.getSchemaFromString()
 

 Key: PIG-2910
 URL: https://issues.apache.org/jira/browse/PIG-2910
 Project: Pig
  Issue Type: Bug
  Components: impl, parser
Affects Versions: 0.9.2, 0.10.0, 0.11, 0.10.1
Reporter: Russell Jurney
Assignee: Eli Reisman
  Labels: newbie
 Fix For: 0.11, 0.10.1

 Attachments: PIG-2910-1.patch, PIG-2910-2.patch, PIG-2910-3.patch, 
 PIG-2910-4.patch


 I want to toString() schemas and send them to the backend via UDFContext. At 
 the moment this requires writing your own toString() method that 
 Utils.getSchemaFromString() can read. Making a readable schema for the 
 backend would be an improvement.
 I spoke with Thejas, who believes this is a bug. The workaround for the 
 moment is, for example:
 String schemaString = inputSchema.toString().substring(1, 
 inputSchema.toString().length() - 1);
 // Set the input schema for processing
 UDFContext context = UDFContext.getUDFContext();
 Properties udfProp = context.getUDFProperties(this.getClass());
 udfProp.setProperty(horton.json.udf.schema, schemaString);
 ...
 schema = Utils.getSchemaFromString(strSchema);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2910) Make toString() methods on Schema and FieldSchema be readable by Utils.getSchemaFromString()

2012-10-12 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-2910:
---

Issue Type: Improvement  (was: Bug)

 Make toString() methods on Schema and FieldSchema be readable by 
 Utils.getSchemaFromString()
 

 Key: PIG-2910
 URL: https://issues.apache.org/jira/browse/PIG-2910
 Project: Pig
  Issue Type: Improvement
  Components: impl, parser
Affects Versions: 0.9.2, 0.10.0, 0.11, 0.10.1
Reporter: Russell Jurney
Assignee: Eli Reisman
  Labels: newbie
 Fix For: 0.11, 0.10.1

 Attachments: PIG-2910-1.patch, PIG-2910-2.patch, PIG-2910-3.patch, 
 PIG-2910-4.patch


 I want to toString() schemas and send them to the backend via UDFContext. At 
 the moment this requires writing your own toString() method that 
 Utils.getSchemaFromString() can read. Making a readable schema for the 
 backend would be an improvement.
 I spoke with Thejas, who believes this is a bug. The workaround for the 
 moment is, for example:
 String schemaString = inputSchema.toString().substring(1, 
 inputSchema.toString().length() - 1);
 // Set the input schema for processing
 UDFContext context = UDFContext.getUDFContext();
 Properties udfProp = context.getUDFProperties(this.getClass());
 udfProp.setProperty(horton.json.udf.schema, schemaString);
 ...
 schema = Utils.getSchemaFromString(strSchema);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2910) Add function to read schema from outout of Schema.toString()

2012-10-12 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-2910:
---

Summary: Add function to read schema from outout of Schema.toString()  
(was: Make toString() methods on Schema and FieldSchema be readable by 
Utils.getSchemaFromString())

 Add function to read schema from outout of Schema.toString()
 

 Key: PIG-2910
 URL: https://issues.apache.org/jira/browse/PIG-2910
 Project: Pig
  Issue Type: Improvement
  Components: impl, parser
Affects Versions: 0.9.2, 0.10.0, 0.11, 0.10.1
Reporter: Russell Jurney
Assignee: Eli Reisman
  Labels: newbie
 Fix For: 0.11, 0.10.1

 Attachments: PIG-2910-1.patch, PIG-2910-2.patch, PIG-2910-3.patch, 
 PIG-2910-4.patch


 I want to toString() schemas and send them to the backend via UDFContext. At 
 the moment this requires writing your own toString() method that 
 Utils.getSchemaFromString() can read. Making a readable schema for the 
 backend would be an improvement.
 I spoke with Thejas, who believes this is a bug. The workaround for the 
 moment is, for example:
 String schemaString = inputSchema.toString().substring(1, 
 inputSchema.toString().length() - 1);
 // Set the input schema for processing
 UDFContext context = UDFContext.getUDFContext();
 Properties udfProp = context.getUDFProperties(this.getClass());
 udfProp.setProperty(horton.json.udf.schema, schemaString);
 ...
 schema = Utils.getSchemaFromString(strSchema);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2910) Add function to read schema from outout of Schema.toString()

2012-10-12 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-2910:
---

Fix Version/s: (was: 0.10.1)
   Status: Patch Available  (was: Open)

 Add function to read schema from outout of Schema.toString()
 

 Key: PIG-2910
 URL: https://issues.apache.org/jira/browse/PIG-2910
 Project: Pig
  Issue Type: Improvement
  Components: impl, parser
Affects Versions: 0.10.0, 0.9.2, 0.11, 0.10.1
Reporter: Russell Jurney
Assignee: Eli Reisman
  Labels: newbie
 Fix For: 0.11

 Attachments: PIG-2910-1.patch, PIG-2910-2.patch, PIG-2910-3.patch, 
 PIG-2910-4.patch


 I want to toString() schemas and send them to the backend via UDFContext. At 
 the moment this requires writing your own toString() method that 
 Utils.getSchemaFromString() can read. Making a readable schema for the 
 backend would be an improvement.
 I spoke with Thejas, who believes this is a bug. The workaround for the 
 moment is, for example:
 String schemaString = inputSchema.toString().substring(1, 
 inputSchema.toString().length() - 1);
 // Set the input schema for processing
 UDFContext context = UDFContext.getUDFContext();
 Properties udfProp = context.getUDFProperties(this.getClass());
 udfProp.setProperty(horton.json.udf.schema, schemaString);
 ...
 schema = Utils.getSchemaFromString(strSchema);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2910) Add function to read schema from outout of Schema.toString()

2012-10-12 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-2910:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

+1 Patch committed to trunk.
Eli, Thanks for the patch!

 Add function to read schema from outout of Schema.toString()
 

 Key: PIG-2910
 URL: https://issues.apache.org/jira/browse/PIG-2910
 Project: Pig
  Issue Type: Improvement
  Components: impl, parser
Affects Versions: 0.9.2, 0.10.0, 0.11, 0.10.1
Reporter: Russell Jurney
Assignee: Eli Reisman
  Labels: newbie
 Fix For: 0.11

 Attachments: PIG-2910-1.patch, PIG-2910-2.patch, PIG-2910-3.patch, 
 PIG-2910-4.patch


 I want to toString() schemas and send them to the backend via UDFContext. At 
 the moment this requires writing your own toString() method that 
 Utils.getSchemaFromString() can read. Making a readable schema for the 
 backend would be an improvement.
 I spoke with Thejas, who believes this is a bug. The workaround for the 
 moment is, for example:
 String schemaString = inputSchema.toString().substring(1, 
 inputSchema.toString().length() - 1);
 // Set the input schema for processing
 UDFContext context = UDFContext.getUDFContext();
 Properties udfProp = context.getUDFProperties(this.getClass());
 udfProp.setProperty(horton.json.udf.schema, schemaString);
 ...
 schema = Utils.getSchemaFromString(strSchema);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2910) Make toString() methods on Schema and FieldSchema be readable by Utils.getSchemaFromString()

2012-10-10 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13473535#comment-13473535
 ] 

Thejas M Nair commented on PIG-2910:


Yes, The changes in 2910-3 patch look good. Can you please add a test case, and 
also add a comment that this schema string has {} around it, and that is why 
the substring is being done ?


 Make toString() methods on Schema and FieldSchema be readable by 
 Utils.getSchemaFromString()
 

 Key: PIG-2910
 URL: https://issues.apache.org/jira/browse/PIG-2910
 Project: Pig
  Issue Type: Bug
  Components: impl, parser
Affects Versions: 0.9.2, 0.10.0, 0.11, 0.10.1
Reporter: Russell Jurney
Assignee: Thejas M Nair
  Labels: newbie
 Fix For: 0.11, 0.10.1

 Attachments: PIG-2910-1.patch, PIG-2910-2.patch, PIG-2910-3.patch


 I want to toString() schemas and send them to the backend via UDFContext. At 
 the moment this requires writing your own toString() method that 
 Utils.getSchemaFromString() can read. Making a readable schema for the 
 backend would be an improvement.
 I spoke with Thejas, who believes this is a bug. The workaround for the 
 moment is, for example:
 String schemaString = inputSchema.toString().substring(1, 
 inputSchema.toString().length() - 1);
 // Set the input schema for processing
 UDFContext context = UDFContext.getUDFContext();
 Properties udfProp = context.getUDFProperties(this.getClass());
 udfProp.setProperty(horton.json.udf.schema, schemaString);
 ...
 schema = Utils.getSchemaFromString(strSchema);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2910) Make toString() methods on Schema and FieldSchema be readable by Utils.getSchemaFromString()

2012-10-09 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13472941#comment-13472941
 ] 

Thejas M Nair commented on PIG-2910:


The extra parenthesis added by Schema.toString() are curly braces. I believe 
this is because the schema when thought of as a schema of a relation, is a 
schema of a bag. I don't think the PIG-2910-2.patch will fix the issue.

But changing the behavior of either Schema.toString() or 
Utils.getSchemaFromString() will break backward compatibility. I think we 
should just add a new function Utils.getSchemaFromBagSchemaString() and comment 
that people should use this one to get schema back from output of 
Schema.toString(). 

The use of input schema of udf, during udf execution is not very common. I 
don't think we should serialize it for all udfs.


 Make toString() methods on Schema and FieldSchema be readable by 
 Utils.getSchemaFromString()
 

 Key: PIG-2910
 URL: https://issues.apache.org/jira/browse/PIG-2910
 Project: Pig
  Issue Type: Bug
  Components: impl, parser
Affects Versions: 0.9.2, 0.10.0, 0.11, 0.10.1
Reporter: Russell Jurney
Assignee: Thejas M Nair
  Labels: newbie
 Fix For: 0.11, 0.10.1

 Attachments: PIG-2910-1.patch, PIG-2910-2.patch


 I want to toString() schemas and send them to the backend via UDFContext. At 
 the moment this requires writing your own toString() method that 
 Utils.getSchemaFromString() can read. Making a readable schema for the 
 backend would be an improvement.
 I spoke with Thejas, who believes this is a bug. The workaround for the 
 moment is, for example:
 String schemaString = inputSchema.toString().substring(1, 
 inputSchema.toString().length() - 1);
 // Set the input schema for processing
 UDFContext context = UDFContext.getUDFContext();
 Properties udfProp = context.getUDFProperties(this.getClass());
 udfProp.setProperty(horton.json.udf.schema, schemaString);
 ...
 schema = Utils.getSchemaFromString(strSchema);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2769) a simple logic causes very long compiling time on pig 0.10.0

2012-10-04 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-2769:
---

Assignee: Timothy Chen

 a simple logic causes very long compiling time on pig 0.10.0
 

 Key: PIG-2769
 URL: https://issues.apache.org/jira/browse/PIG-2769
 Project: Pig
  Issue Type: Bug
  Components: build
Affects Versions: 0.10.0
 Environment: Apache Pig version 0.10.0-SNAPSHOT (rexported)
Reporter: Dan Li
Assignee: Timothy Chen
 Fix For: 0.11

 Attachments: case1.tar


 We found the following simple logic will cause very long compiling time for 
 pig 0.10.0, while using pig 0.8.1, everything is fine.
 A = load 'A.txt' using PigStorage()  AS (m: int);
 B = FOREACH A {
 days_str = (chararray)
 (m == 1 ? 31: 
 (m == 2 ? 28: 
 (m == 3 ? 31: 
 (m == 4 ? 30: 
 (m == 5 ? 31: 
 (m == 6 ? 30: 
 (m == 7 ? 31: 
 (m == 8 ? 31: 
 (m == 9 ? 30: 
 (m == 10 ? 31: 
 (m == 11 ? 30:31)));
 GENERATE
days_str as days_str;
 }   
 store B into 'B';
 and here's a simple input file example: A.txt
 1
 2
 3
 The pig version we used in the test
 Apache Pig version 0.10.0-SNAPSHOT (rexported)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2769) a simple logic causes very long compiling time on pig 0.10.0

2012-10-04 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13469592#comment-13469592
 ] 

Thejas M Nair commented on PIG-2769:


I have assigned it to you. I have also added you to contributors list, now you 
should be able to assign jiras to yourself.


 a simple logic causes very long compiling time on pig 0.10.0
 

 Key: PIG-2769
 URL: https://issues.apache.org/jira/browse/PIG-2769
 Project: Pig
  Issue Type: Bug
  Components: build
Affects Versions: 0.10.0
 Environment: Apache Pig version 0.10.0-SNAPSHOT (rexported)
Reporter: Dan Li
Assignee: Timothy Chen
 Fix For: 0.11

 Attachments: case1.tar


 We found the following simple logic will cause very long compiling time for 
 pig 0.10.0, while using pig 0.8.1, everything is fine.
 A = load 'A.txt' using PigStorage()  AS (m: int);
 B = FOREACH A {
 days_str = (chararray)
 (m == 1 ? 31: 
 (m == 2 ? 28: 
 (m == 3 ? 31: 
 (m == 4 ? 30: 
 (m == 5 ? 31: 
 (m == 6 ? 30: 
 (m == 7 ? 31: 
 (m == 8 ? 31: 
 (m == 9 ? 30: 
 (m == 10 ? 31: 
 (m == 11 ? 30:31)));
 GENERATE
days_str as days_str;
 }   
 store B into 'B';
 and here's a simple input file example: A.txt
 1
 2
 3
 The pig version we used in the test
 Apache Pig version 0.10.0-SNAPSHOT (rexported)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (PIG-2911) GenMRSkewJoinProcessor uses File.Separator instead of Path.Separator

2012-09-07 Thread Thejas M Nair (JIRA)

Thejas M Nair created PIG-2911:
--

 Summary: GenMRSkewJoinProcessor uses File.Separator instead of 
Path.Separator
 Key: PIG-2911
 URL: https://issues.apache.org/jira/browse/PIG-2911
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: PIG-2911.1.patch

This causes testcase skewjoin.q to fail on windows.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2911) GenMRSkewJoinProcessor uses File.Separator instead of Path.Separator

2012-09-07 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-2911:
---

Attachment: PIG-2911.1.patch

 GenMRSkewJoinProcessor uses File.Separator instead of Path.Separator
 

 Key: PIG-2911
 URL: https://issues.apache.org/jira/browse/PIG-2911
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: PIG-2911.1.patch


 This causes testcase skewjoin.q to fail on windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2911) GenMRSkewJoinProcessor uses File.Separator instead of Path.Separator

2012-09-07 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-2911:
---

Status: Patch Available  (was: Open)

 GenMRSkewJoinProcessor uses File.Separator instead of Path.Separator
 

 Key: PIG-2911
 URL: https://issues.apache.org/jira/browse/PIG-2911
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: PIG-2911.1.patch


 This causes testcase skewjoin.q to fail on windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2911) GenMRSkewJoinProcessor uses File.Separator instead of Path.Separator

2012-09-07 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-2911:
---

Resolution: Invalid
Status: Resolved  (was: Patch Available)

Sorry, created the bug on wrong product! :)


 GenMRSkewJoinProcessor uses File.Separator instead of Path.Separator
 

 Key: PIG-2911
 URL: https://issues.apache.org/jira/browse/PIG-2911
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: PIG-2911.1.patch


 This causes testcase skewjoin.q to fail on windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2895) jodatime jar missing in pig-withouthadoop.jar

2012-08-30 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-2895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-2895:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

patch committed to trunk

 jodatime jar missing in pig-withouthadoop.jar
 -

 Key: PIG-2895
 URL: https://issues.apache.org/jira/browse/PIG-2895
 Project: Pig
  Issue Type: Sub-task
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.11

 Attachments: PIG-2895.1.patch, PIG-2895.2.patch


 jodatime jar is missing in pig-withouthadoop.jar. When an external hadoop.jar 
 is used, pig will fail with class not found error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (PIG-2893) fix DBStorage compile issue

2012-08-29 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair reassigned PIG-2893:
--

Assignee: Thejas M Nair

 fix DBStorage compile issue
 ---

 Key: PIG-2893
 URL: https://issues.apache.org/jira/browse/PIG-2893
 Project: Pig
  Issue Type: Sub-task
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.11

 Attachments: PIG-2893.1.patch


 DBStorage does not compile after the datetime patch was committed. The joda 
 datetime was passed as argument to java.sql.PreparedStatement.setDate() 
 instead of java.sql.Date .

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2895) jodatime jar missing in pig-withouthadoop.jar

2012-08-29 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-2895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-2895:
---

Attachment: PIG-2895.2.patch

PIG-2895.2.patch - adds org/joda/time to pigPackagesToSend.
Thanks Julien for the directions!


 jodatime jar missing in pig-withouthadoop.jar
 -

 Key: PIG-2895
 URL: https://issues.apache.org/jira/browse/PIG-2895
 Project: Pig
  Issue Type: Sub-task
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.11

 Attachments: PIG-2895.1.patch, PIG-2895.2.patch


 jodatime jar is missing in pig-withouthadoop.jar. When an external hadoop.jar 
 is used, pig will fail with class not found error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2895) jodatime jar missing in pig-withouthadoop.jar

2012-08-29 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-2895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-2895:
---

Status: Patch Available  (was: Open)

Patch tested against a multi-node hadoop cluster.


 jodatime jar missing in pig-withouthadoop.jar
 -

 Key: PIG-2895
 URL: https://issues.apache.org/jira/browse/PIG-2895
 Project: Pig
  Issue Type: Sub-task
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.11

 Attachments: PIG-2895.1.patch, PIG-2895.2.patch


 jodatime jar is missing in pig-withouthadoop.jar. When an external hadoop.jar 
 is used, pig will fail with class not found error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (PIG-2895) jodatime jar missing in pig-withouthadoop.jar

2012-08-28 Thread Thejas M Nair (JIRA)

Thejas M Nair created PIG-2895:
--

 Summary: jodatime jar missing in pig-withouthadoop.jar
 Key: PIG-2895
 URL: https://issues.apache.org/jira/browse/PIG-2895
 Project: Pig
  Issue Type: Sub-task
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.11


jodatime jar is missing in pig-withouthadoop.jar. When an external hadoop.jar 
is used, pig will fail with class not found error.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2895) jodatime jar missing in pig-withouthadoop.jar

2012-08-28 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-2895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-2895:
---

Attachment: PIG-2895.1.patch

 jodatime jar missing in pig-withouthadoop.jar
 -

 Key: PIG-2895
 URL: https://issues.apache.org/jira/browse/PIG-2895
 Project: Pig
  Issue Type: Sub-task
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.11

 Attachments: PIG-2895.1.patch


 jodatime jar is missing in pig-withouthadoop.jar. When an external hadoop.jar 
 is used, pig will fail with class not found error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-08-28 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443630#comment-13443630
 ] 

Thejas M Nair commented on PIG-1314:


Yes, that was not intentional. Deleted JobControlCompiler.java.orig in svn.


 Add DateTime Support to Pig
 ---

 Key: PIG-1314
 URL: https://issues.apache.org/jira/browse/PIG-1314
 Project: Pig
  Issue Type: Bug
  Components: data
Affects Versions: 0.7.0
Reporter: Russell Jurney
Assignee: Zhijie Shen
  Labels: gsoc2012
 Attachments: joda_vs_builtin.zip, PIG-1314-1.patch, PIG-1314-2.patch, 
 PIG-1314-3.patch, PIG-1314-4.patch, PIG-1314-5.patch, PIG-1314-6.patch, 
 PIG-1314-7.patch

   Original Estimate: 672h
  Remaining Estimate: 672h

 Hadoop/Pig are primarily used to parse log data, and most logs have a 
 timestamp component.  Therefore Pig should support dates as a primitive.
 Can someone familiar with adding types to pig comment on how hard this is?  
 We're looking at doing this, rather than use UDFs.  Is this a patch that 
 would be accepted?
 This is a candidate project for Google summer of code 2012. More information 
 about the program can be found at 
 https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2893) fix DBStorage compile issue

2012-08-27 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-2893:
---

Attachment: PIG-2893.1.patch

PIG-2893.1.patch - fix for compile issue, updates to test case to use datetime 
type.


 fix DBStorage compile issue
 ---

 Key: PIG-2893
 URL: https://issues.apache.org/jira/browse/PIG-2893
 Project: Pig
  Issue Type: Sub-task
Reporter: Thejas M Nair
 Attachments: PIG-2893.1.patch


 DBStorage does not compile after the datetime patch was committed. The joda 
 datetime was passed as argument to java.sql.PreparedStatement.setDate() 
 instead of java.sql.Date .

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (PIG-2893) fix DBStorage compile issue

2012-08-27 Thread Thejas M Nair (JIRA)

Thejas M Nair created PIG-2893:
--

 Summary: fix DBStorage compile issue
 Key: PIG-2893
 URL: https://issues.apache.org/jira/browse/PIG-2893
 Project: Pig
  Issue Type: Sub-task
Reporter: Thejas M Nair
 Attachments: PIG-2893.1.patch

DBStorage does not compile after the datetime patch was committed. The joda 
datetime was passed as argument to java.sql.PreparedStatement.setDate() instead 
of java.sql.Date .


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2893) fix DBStorage compile issue

2012-08-27 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13442922#comment-13442922
 ] 

Thejas M Nair commented on PIG-2893:


The compile error was - 
[javac] symbol  : method setDate(int,java.util.Date)
[javac] location: interface java.sql.PreparedStatement
[javac] ps.setDate(sqlPos, ((DateTime) field).toDate());

 fix DBStorage compile issue
 ---

 Key: PIG-2893
 URL: https://issues.apache.org/jira/browse/PIG-2893
 Project: Pig
  Issue Type: Sub-task
Reporter: Thejas M Nair
 Attachments: PIG-2893.1.patch


 DBStorage does not compile after the datetime patch was committed. The joda 
 datetime was passed as argument to java.sql.PreparedStatement.setDate() 
 instead of java.sql.Date .

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2662) skew join does not honor its config parameters

2012-08-23 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13440789#comment-13440789
 ] 

Thejas M Nair commented on PIG-2662:


Koji, What OS, JVM are you using ?


 skew join does not honor its config parameters
 --

 Key: PIG-2662
 URL: https://issues.apache.org/jira/browse/PIG-2662
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.9.2, 0.10.0
Reporter: Thejas M Nair
Assignee: Rajesh Balamohan
 Fix For: 0.11

 Attachments: PIG-2662-0.9.2.patch, PIG-2662.2.patch, PIG-2662.3.patch


 Skew join can be configured using pig.sksampler.samplerate and 
 pig.skewedjoin.reduce.memusage. But the section of code the retrieves the 
 config values from properties (PoissonSampleLoader.computeSamples) is not 
 getting called. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2811) Updating .eclipse.templates/.classpath with the Newest Jython Version

2012-08-23 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-2811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-2811:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Fixed in the PIG-1314 patch.


 Updating .eclipse.templates/.classpath with the Newest Jython Version
 -

 Key: PIG-2811
 URL: https://issues.apache.org/jira/browse/PIG-2811
 Project: Pig
  Issue Type: Bug
  Components: tools
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Trivial
 Fix For: 0.11

 Attachments: PIG-2811.patch


 Jython library version has been upgraded to 2.5.2 by the PIG-2665 patch, but 
 the related modification is not made in the Eclipse template file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-08-23 Thread Thejas M Nair (JIRA)

[
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13440851#comment-13440851
]

Thejas M Nair commented on PIG-1314:

PIG-1314-7.patch committed to trunk! Thanks Zhijie.

We need to update the documentation regarding this change. Can you please
upload a new patch for that ? To see generated docs, run - ant
-Dforrest.home=Forrest installation dir docs. The files to be edited are
under - trunk/src/docs/src/documentation/ .

We should also add a few end to end test cases for datetime. See
https://cwiki.apache.org/confluence/display/PIG/HowToTest#HowToTest-EndtoendTesting
. We should have a few queries that do some of the basic operations on date
time, and queries that have order-by , group and join on date fields.
These can be submitted as multiple patches.

Add DateTime Support to Pig
---

Key: PIG-1314
URL: https://issues.apache.org/jira/browse/PIG-1314
Project: Pig
Issue Type: Bug
Components: data
Affects Versions: 0.7.0
Reporter: Russell Jurney
Assignee: Zhijie Shen
Labels: gsoc2012
Attachments: joda_vs_builtin.zip, PIG-1314-1.patch, PIG-1314-2.patch,
PIG-1314-3.patch, PIG-1314-4.patch, PIG-1314-5.patch, PIG-1314-6.patch,
PIG-1314-7.patch

Original Estimate: 672h
Remaining Estimate: 672h

Hadoop/Pig are primarily used to parse log data, and most logs have a
timestamp component. Therefore Pig should support dates as a primitive.
Can someone familiar with adding types to pig comment on how hard this is?
We're looking at doing this, rather than use UDFs. Is this a patch that
would be accepted?
This is a candidate project for Google summer of code 2012. More information
about the program can be found at
https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-08-23 Thread Thejas M Nair (JIRA)

[
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13440858#comment-13440858
]

Thejas M Nair commented on PIG-1314:

We also need to have some test cases that set the timezone property. This might
not be easy to do in the e2e framework, so unit test cases are better candidate
for this. Please let me know if you need any help.

Add DateTime Support to Pig
---

Original Estimate: 672h
Remaining Estimate: 672h

[jira] [Updated] (PIG-2662) skew join does not honor its config parameters

2012-08-16 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-2662:
---

Fix Version/s: 0.11
 Assignee: Rajesh Balamohan
   Status: Patch Available  (was: Open)

 skew join does not honor its config parameters
 --

 Key: PIG-2662
 URL: https://issues.apache.org/jira/browse/PIG-2662
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.10.0, 0.9.2
Reporter: Thejas M Nair
Assignee: Rajesh Balamohan
 Fix For: 0.11

 Attachments: PIG-2662-0.9.2.patch, PIG-2662.2.patch, PIG-2662.3.patch


 Skew join can be configured using pig.sksampler.samplerate and 
 pig.skewedjoin.reduce.memusage. But the section of code the retrieves the 
 config values from properties (PoissonSampleLoader.computeSamples) is not 
 getting called. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2662) skew join does not honor its config parameters

2012-08-16 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-2662:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

+1  Patch committed to trunk.
Thanks Rajesh!


 skew join does not honor its config parameters
 --

 Key: PIG-2662
 URL: https://issues.apache.org/jira/browse/PIG-2662
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.9.2, 0.10.0
Reporter: Thejas M Nair
Assignee: Rajesh Balamohan
 Fix For: 0.11

 Attachments: PIG-2662-0.9.2.patch, PIG-2662.2.patch, PIG-2662.3.patch


 Skew join can be configured using pig.sksampler.samplerate and 
 pig.skewedjoin.reduce.memusage. But the section of code the retrieves the 
 config values from properties (PoissonSampleLoader.computeSamples) is not 
 getting called. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2662) skew join does not honor its config parameters

2012-08-14 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13434294#comment-13434294
 ] 

Thejas M Nair commented on PIG-2662:


Rajesh, 
With the patch, TestPoissonSampleLoader test cases fail. Can you please take a 
look ? 
Please let me know if you need any help with that.


 skew join does not honor its config parameters
 --

 Key: PIG-2662
 URL: https://issues.apache.org/jira/browse/PIG-2662
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.9.2, 0.10.0
Reporter: Thejas M Nair
 Attachments: PIG-2662-0.9.2.patch, PIG-2662.2.patch


 Skew join can be configured using pig.sksampler.samplerate and 
 pig.skewedjoin.reduce.memusage. But the section of code the retrieves the 
 config values from properties (PoissonSampleLoader.computeSamples) is not 
 getting called. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-08-14 Thread Thejas M Nair (JIRA)

[
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13434733#comment-13434733
]

Thejas M Nair commented on PIG-1314:

bq. 2. According to your last response, I'm not clear how the default timezone
of client can be sent to the server with the code. In my opinion, the default
timezone should be specified on the server side by configuration, which should
be taken care of by administrators. How do you think about this.

I believe you should be able to set the default timezone property in PigContext
constructor, and also let user override the default. In backend, you can access
the value using something like -
PigMapReduce.sJobConfInternal.get().get(pig.datetime.default.tz).

Add DateTime Support to Pig
---

Original Estimate: 672h
Remaining Estimate: 672h

[jira] [Updated] (PIG-2662) skew join does not honor its config parameters

2012-08-07 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-2662:
---

Attachment: PIG-2662.2.patch

PIG-2662.2.patch - This patch fixes compile error in previous one (conf 
variable is not declared). Running tests with this one. 

 skew join does not honor its config parameters
 --

 Key: PIG-2662
 URL: https://issues.apache.org/jira/browse/PIG-2662
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.9.2, 0.10.0
Reporter: Thejas M Nair
 Attachments: PIG-2662-0.9.2.patch, PIG-2662.2.patch


 Skew join can be configured using pig.sksampler.samplerate and 
 pig.skewedjoin.reduce.memusage. But the section of code the retrieves the 
 config values from properties (PoissonSampleLoader.computeSamples) is not 
 getting called. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2829) Use partial aggregation more aggresively

2012-07-27 Thread Thejas M Nair (JIRA)

[
https://issues.apache.org/jira/browse/PIG-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13424140#comment-13424140
]

Thejas M Nair commented on PIG-2829:

Thanks for the benchmark Jie. Clearly, partial-agg is working better than
combiner.
Can you also run some benchmarks with combiner turned off, so that we can
verify the appropriate value for pig.exec.mapPartAgg.minReduction -

||query || combiner off, partial-agg off || combiner off, partial-agg on ||
|g-by with reduction by 3 | | |
|g-by with reduction by 2| | |

Use partial aggregation more aggresively

Key: PIG-2829
URL: https://issues.apache.org/jira/browse/PIG-2829
Project: Pig
Issue Type: Improvement
Affects Versions: 0.10.0
Reporter: Jie Li
Attachments: 2829.1.patch, 2829.2.patch, 2829.separate.options.patch,
pigmix-10G.png, tpch-10G.png

Partial aggregation (Hash Aggregation, aka in-map combiner) is a new feature
in Pig 0.10 that will perform aggregation within map function. The main
advantage against combiner is it avoids de/serializing and sorting the data,
and it can auto disable itself if the data reduction rate is low. Currently
it's disabled by default.
To leverage the power of PartialAgg more aggressively, several things need to
be revisited:
1. The threshold of auto-disabling. Currently each mapper looks at first 1k
(hard-coded) records to see if there's enough data size reduction (defaults
to 10x, configurable). The check would happen earlier if the hash table gets
full before processing the 1k records (hash table size is controlled by
pig.cachedbag.memusage). We might want to relax these thresholds.
2. Dependency on the combiner. Currently the PartialAgg won't work without a
combiner following it, so we need to provide separate options to enable each
independently.

[jira] [Commented] (PIG-2829) Use partial aggregation more aggresively

2012-07-26 Thread Thejas M Nair (JIRA)

[
https://issues.apache.org/jira/browse/PIG-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13423602#comment-13423602
]

Thejas M Nair commented on PIG-2829:

I will review the patch soon. Some comments regarding the default configuration
-

bq. 2: changes existing default values:
After thinking of the multi-query use case, where you can have multiple
POPartialAgg operators in a map task, I am having second thoughts on turning
partial agg on by default. Can you try these settings queries where there are
around 10+ group+agg that get combined into single MR job ? Maybe we should
address the potential OOM issues for this use case before we change the
defaults. This is likely to be become a bigger issue when we use 100k records
to decide to turn on/off the partial aggregation.

bq. 3: adds a property pig.exec.mapPartAgg.reduction.checkinterval which
defaults to 100k, so after processing every 100k records mapagg will check the
reduction rate to see if it should be disabled. Previously we only look at
first 1000 records.
Can you do some benchmarks to see if there is any noticeable difference in
runtime because of the delay in turning mapPartAgg off ?

Use partial aggregation more aggresively

Key: PIG-2829
URL: https://issues.apache.org/jira/browse/PIG-2829
Project: Pig
Issue Type: Improvement
Affects Versions: 0.10.0
Reporter: Jie Li
Attachments: 2829.1.patch, 2829.separate.options.patch,
pigmix-10G.png, tpch-10G.png

[jira] [Commented] (PIG-2826) Training link on front page no longer points to Pig training

2012-07-20 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-2826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419589#comment-13419589
 ] 

Thejas M Nair commented on PIG-2826:


+1

 Training link on front page no longer points to Pig training
 

 Key: PIG-2826
 URL: https://issues.apache.org/jira/browse/PIG-2826
 Project: Pig
  Issue Type: Bug
  Components: site
Affects Versions: site
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: site

 Attachments: PIG-2826.patch


 The training link on Pig's website used to point to a Pig specific video on 
 Cloudera's site.  It now points to a list of all their videos.  Also, at the 
 time they were the only ones providing training videos for Hadoop.  Now other 
 vendors do as well.  This link should be replaced by a link to a wiki page 
 where vendors who wish to can list their training resources.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-07-11 Thread Thejas M Nair (JIRA)

[
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412452#comment-13412452
]

Thejas M Nair commented on PIG-1314:

Zhijie,
I have added comments on your latest patch in
https://reviews.apache.org/r/5414/.
Yes, lets focus on test cases now, so that we can get an initial version
committed.

Add DateTime Support to Pig
---

Key: PIG-1314
URL: https://issues.apache.org/jira/browse/PIG-1314
Project: Pig
Issue Type: Bug
Components: data
Affects Versions: 0.7.0
Reporter: Russell Jurney
Assignee: Zhijie Shen
Labels: gsoc2012
Attachments: PIG-1314-1.patch, PIG-1314-2.patch, PIG-1314-3.patch,
joda_vs_builtin.zip

Original Estimate: 672h
Remaining Estimate: 672h

[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-07-03 Thread Thejas M Nair (JIRA)

[
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13405971#comment-13405971
]

Thejas M Nair commented on PIG-1314:

PigStorage is meant to be a human readable format. So that is another reason to
store the timestamp in the ISO string as you suggested.
Yes, If the timezone is specified in the string, pig should use that value. But
the timezone part and time part of the datetime string should be optional. Does
jodatime support that ?

Add DateTime Support to Pig
---

Key: PIG-1314
URL: https://issues.apache.org/jira/browse/PIG-1314
Project: Pig
Issue Type: Bug
Components: data
Affects Versions: 0.7.0
Reporter: Russell Jurney
Assignee: Zhijie Shen
Labels: gsoc2012
Attachments: PIG-1314-1.patch, PIG-1314-2.patch, joda_vs_builtin.zip

Original Estimate: 672h
Remaining Estimate: 672h

[jira] [Commented] (PIG-2774) Fix merge join to work with many duplicate left keys

2012-06-28 Thread Thejas M Nair (JIRA)

[
https://issues.apache.org/jira/browse/PIG-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13403461#comment-13403461
]

Thejas M Nair commented on PIG-2774:

bq. I'd like to avoid having the user encode these details in the pig script.

Floating some more ideas -

A more performant way of doing this would be to stop accumulating tuples for a
join key value from left relation into memory when a certain memory threshold
is exceeded. Once join of these tuples against the right relation is done,
discard the accumulated left rel tuples for the join key and and load a new
set, go back to the start of relations with this join key in right relation and
continue.
To go back more efficiently to the start of join key in right relation we can
keep track of its record offset. This approach will have no additional writes
and have less IO overall. The right relation block hopefully gets in to OS
cache.
But this approach can result in some map tasks being much slower than others.

Another option is to write the left side join key values that didn't fit into
memory onto hdfs in separate files, one file for each chunch that is expected
to fit into memory, and have another round of MR job do merge join on these
files. ( I think hive has a skew join impl on similar lines). This would
involve changing the MR plan at runtime.

Fix merge join to work with many duplicate left keys

Key: PIG-2774
URL: https://issues.apache.org/jira/browse/PIG-2774
Project: Pig
Issue Type: Bug
Reporter: Aneesh Sharma

A merge join can throw an OOM error if the number of duplicate left tuples is
large as it accumulates all of them in memory. There are two solutions around
this problem:
1. Serialize the accumulated tuples to disk if they exceed a certain size.
2. Spit out join output periodically, and re-seek on the right hand side
index.

[jira] [Commented] (PIG-2774) Fix merge join to work with many duplicate left keys

2012-06-28 Thread Thejas M Nair (JIRA)

[
https://issues.apache.org/jira/browse/PIG-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13403480#comment-13403480
]

Thejas M Nair commented on PIG-2774:

bq. we might have other operations queued up after the join

In 2nd approach, the operations within map task don't complicate things. But to
handle a reduce after the merge-join, we would need to introduce another map
task that does a union of merge-join results. For example, if the merge-join is
followed by a group+agg , then the follow transformation to plan would be
needed.
Map(Merge-join + group+agg ops) + Reduce(group+agg ops)
= Map (merge-join wave 1 + group+agg ops) + Map (merge-join wave 2 +
group+agg opps) + Map(union of 1st 2 maps) + Reduce(group+agg ops)

This transformation can't happen dynamically - we can't decide to skip the
reduce while in the map phase.

To handle this case dynamically, looks like the first approach is one that
actually would work! The user or a metadata system possibly identify the skew
problem and recommend using a 'skew-merge' join next time query is run on
similar data.

Fix merge join to work with many duplicate left keys

Key: PIG-2774
URL: https://issues.apache.org/jira/browse/PIG-2774
Project: Pig
Issue Type: Bug
Reporter: Aneesh Sharma

[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-06-27 Thread Thejas M Nair (JIRA)

[
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402467#comment-13402467
]

Thejas M Nair commented on PIG-1314:

bq. Or we temporally set aside the performance issue right now, and move
forward to make timezone serialization work by simply serializing the timezone
id string.
We can add features later, but dropping features later won't be good. In my
opinion, the support for long timezone name is not going to be needed by most
people. I think we can support it only for creating a DateTime field, but say
that pig will not preserve the long name. Pig will only retain hours+minute
offset (no seconds and milliseconds!). The hour+min offset form is portable and
more likely to be supported by other serialization formats.

Add DateTime Support to Pig
---

Key: PIG-1314
URL: https://issues.apache.org/jira/browse/PIG-1314
Project: Pig
Issue Type: Bug
Components: data
Affects Versions: 0.7.0
Reporter: Russell Jurney
Assignee: Zhijie Shen
Labels: gsoc2012
Attachments: PIG-1314-1.patch, PIG-1314-2.patch, joda_vs_builtin.zip

Original Estimate: 672h
Remaining Estimate: 672h

[jira] [Commented] (PIG-2774) Fix merge join to work with many duplicate left keys

2012-06-27 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402722#comment-13402722
 ] 

Thejas M Nair commented on PIG-2774:


If the left side relations tuples for a value of join key are serialized to 
disk, then for ever value of join key in right relation, it will hit the disk. 
That will perform very poorly.
Looks like what we need is something like a merge-skew join. Ie, similar to 
skew join,  sample left side, and partition the splits for map tasks based on 
sampled information. 

 Fix merge join to work with many duplicate left keys
 

 Key: PIG-2774
 URL: https://issues.apache.org/jira/browse/PIG-2774
 Project: Pig
  Issue Type: Bug
Reporter: Aneesh Sharma

 A merge join can throw an OOM error if the number of duplicate left tuples is 
 large as it accumulates all of them in memory. There are two solutions around 
 this problem:
 1. Serialize the accumulated tuples to disk if they exceed a certain size.
 2. Spit out join output periodically, and re-seek on the right hand side 
 index.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-06-26 Thread Thejas M Nair (JIRA)

[
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13401613#comment-13401613
]

Thejas M Nair commented on PIG-1314:

bq. As far as I know, either Java builtin Date or Joda DateTime uses
millisecond-shift (stored in a long integer variable) from the midnight UTC,
which is not exactly the Unix time.
Yes, as you noted, the difference is unix timestamp can store upto +/- 292
Billion years, while Joda DateTime supports only +/- 292 Milllion years. Which
should be sufficient for most practical purposes! :)

bq. The time zone determines only determines the ISO time string,
It also affects the field values, (getDayOfWeek(), getHourOfDay() etc. In your
data, you can have dates belonging to different timezones, and users might want
to retain that information.
An example of use case where timezone also needs to be stored - if you want to
do analysis of how many people come to a global website during their morning
hours, you want to .getHourOfDay() to return the hour as per local timezone.

We need an efficient way to serialize timezone along with the long. Can you
propose something ? (Maybe, just make it efficient for 256 most 'popular'
timezones and store it a byte. And not have the byte for UTC. For other
timezones, add a timezone string ?)

bq. When we need to convert the DateTime object to Unix time string, we may use
the default time zone of the Pig environment
If the date field has the timezone value in it, we don't have to rely on
default time zone to convert to unix time stamp. (assuming that is what you
meant by 'unix time *string*' )
But udfs like DateTime ToDate(String s) where timezone might not be specified,
we need a default timezone. I think we should use the default timezone on the
pig client machine. Using the default time zone on each task tracker node can
lead to a nightmare in debugging if one of the nodes happens to have a
different timezone. We should allow the user to set a default timezone using a
pig property.

bq. We probably need one more UDF String ToString(DateTime d, String format,
String timezone)
Having timezone argument in this call is necessary only if user wants to print
the time for a different timezone. This is useful, but not mandatory.

bq.Since the ISO duration is non-negative (Please correct me if I'm wrong), we
need to SubstractDuration as well.
Yes, you are right. I could not find any references to negative values in ISO
duration. Lets add SubstractDuration

Trivia from wikipedia: 64 bit unix timestamp, in the negative direction, goes
back more than twenty times the age of the universe

Add DateTime Support to Pig
---

Key: PIG-1314
URL: https://issues.apache.org/jira/browse/PIG-1314
Project: Pig
Issue Type: Bug
Components: data
Affects Versions: 0.7.0
Reporter: Russell Jurney
Assignee: Zhijie Shen
Labels: gsoc2012
Attachments: PIG-1314-1.patch, PIG-1314-2.patch, joda_vs_builtin.zip

Original Estimate: 672h
Remaining Estimate: 672h

[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-06-26 Thread Thejas M Nair (JIRA)

[
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13401937#comment-13401937
]

Thejas M Nair commented on PIG-1314:

bq. Several time zones may share the same UTC offset, such that when the
reverse operation is to do, it will be unknown which timezone the UTC offset
should be converted to.
Yes, it will be lossy, but the part that is important for date calculations is
preserved. The ISO spec only has offset for timezone. I don't think we have to
allow datetime field to be used for storing location information. Does JodaTime
preserve the location string ?

bq. I'm not sure whether getAvailableIDs returns the same time zone list on
all machines or is machine-dependent.
It depends on the release/jar
(http://joda-time.sourceforge.net/tz_update.html). As pig will be shipping this
jar to the nodes, it is ok to assume that it will be the same across all nodes
for a query. So it is safe to rely on the id for intermediate serialization.
But won't jodatime support a timezone outside this list, If the user specifies
a date using the UTC offset format ?

Add DateTime Support to Pig
---

Key: PIG-1314
URL: https://issues.apache.org/jira/browse/PIG-1314
Project: Pig
Issue Type: Bug
Components: data
Affects Versions: 0.7.0
Reporter: Russell Jurney
Assignee: Zhijie Shen
Labels: gsoc2012
Attachments: PIG-1314-1.patch, PIG-1314-2.patch, joda_vs_builtin.zip

Original Estimate: 672h
Remaining Estimate: 672h

[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-06-25 Thread Thejas M Nair (JIRA)

[
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13400299#comment-13400299
]

Thejas M Nair commented on PIG-1314:

bq. 1. Pig can also import into and export from HBase storage, which also
doesn't have the primitive DataTime. Throw exception in this case as well,
correct?
Yes. The exception should be thrown from HBaseStorage.

bq. if we conclude the design for Avro, we should keep to it for the others.
Please note that pig does not have a way of know if the format will support
datetime. The behavior will be controlled by the storage func implementation.
But for the ones that are part of pig codebase, I think we should throw an
exception.

bq. 3. DateTime is serialized as a Long value (Unix timestamp) when it is
necessary.
JodaTime supports milliseconds as well. Will we be able to convert all values
within limits of JodaTime date into a long ?

bq. the output datatype of DiffDate(DateTime d1, DateTime d2) should use long
instead of int, because the diff may be too large for int range to conver.
Makes sense, we should use a type that is appropriate for range.

bq. what does DateTime DateAdd(DateTime d1) mean? Adding datetime based on the
current time?
Not sure. Daniel, do you know ?

bq. we allow explicit cast between datetime and string, correct? Similarly, do
we allow explicit cast between datetime and long/int (representing unix
timestamp)?
Yes, we should support explicit cast between these types. Though conversion to
int might not be successful for all datetime values.

Add DateTime Support to Pig
---

Key: PIG-1314
URL: https://issues.apache.org/jira/browse/PIG-1314
Project: Pig
Issue Type: Bug
Components: data
Affects Versions: 0.7.0
Reporter: Russell Jurney
Assignee: Zhijie Shen
Labels: gsoc2012
Attachments: PIG-1314-1.patch, PIG-1314-2.patch, joda_vs_builtin.zip

Original Estimate: 672h
Remaining Estimate: 672h

[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-06-25 Thread Thejas M Nair (JIRA)

[
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13400705#comment-13400705
]

Thejas M Nair commented on PIG-1314:

bq. what does DateTime DateAdd(DateTime d1) mean? Adding datetime based on the
current time?
Discussed this with Daniel. I think it makes sense to replace this with
different functions -
// add number of days specified in days param to the DateTime date.
// The days param can be positive or negative
AddYears(DateTime date, int days);

Similarly we should have AddMonths, AddDays, AddHours ..

Add DateTime Support to Pig
---

Key: PIG-1314
URL: https://issues.apache.org/jira/browse/PIG-1314
Project: Pig
Issue Type: Bug
Components: data
Affects Versions: 0.7.0
Reporter: Russell Jurney
Assignee: Zhijie Shen
Labels: gsoc2012
Attachments: PIG-1314-1.patch, PIG-1314-2.patch, joda_vs_builtin.zip

Original Estimate: 672h
Remaining Estimate: 672h

1 2 3 4 5 >

1 - 100 of 494 matches

Mail list logo